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Chapter 1 
Taming infinity 


Throughout the history of mathematics, infinity and the 
infinitesimal—the infinitely large and the infinitely small—have 
raised difficulties and paradoxes. For mathematicians, the study of 
infinity arises naturally, but in different ways from how an artist or 
theologian might contemplate it. An artist might welcome the 
challenge of representing some paradox of infinity in their work; 
for the devout, an infinite deity may be central to their faith. But 
the mathematician seeks to define, and ultimately limit, infinity, as 
mathematics cannot fully progress without rigorous and careful 
definitions. Not surprisingly, this has been a tortuous and rocky 
journey that has taken centuries, if not millennia, to resolve. Ifa 
study of the history of mathematics teaches anything, it is an 
appreciation of the endeavour involving many people, over many 
generations, to find right ways of thinking about mathematics. And 
this is especially true in the field of analysis. 


Mathematical analysis, to some extent, seeks to rigorously define 
infinite processes that arise in mathematics, so that logical 
arguments can be made and theorems proven. I write ‘to some 
extent’ because analysis is much more than making previously 
informal mathematics precise; we will see that analysis has ideas 
and concepts all of its own, many of which were being studied 
centuries before there was a notion of analysis as a subject in its 
own right. We shall also see that the applications of analysis are 
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numerous across mathematics and science; indeed, many of the 
roots of analysis are a result of humankind’s efforts to model the 
world around us. 


Actual and potential infinities 


The counting numbers begin 1, then 2, then 3, and so on, and are 
commonly listed as 


Son 


Here the ellipsis ‘. . ? signifies that the list goes on forever. 
We can continue counting without end, never running out 
of numbers. Each counting number, however large, is 
itself finite, but we recognize that there are infinitely many 
counting numbers. 


The above captures two different types of infinity. A potential 
infinity is an infinite process that goes on without end. Much of 
mathematical analysis focuses on such infinite processes. When we 
say there are infinitely many counting numbers, we are referring to 
an actual infinity as if infinity itself were a number. 


Early on, we implicitly meet potential infinities. We learn that 
numbers can be represented as decimal expansions such as: 


1/3 = 0.333333333333333 ... 
1/7 = 0.14.285714.2857142... 
T = 3.141592653589793 ... 


But what exactly does all this mean? What details are hidden by 
those ellipses? This is definitely a mathematician’s question. No 
experimental scientist or engineer has ever needed to know the 
accuracy of a value to more than 15 decimal places. The value of 7 
given above is sufficiently accurate for space missions exploring the 


solar system. So, whilst modern computers have calculated z to 
trillions of places, there are few benefits of such knowledge in the 
physical world. 


One argument for verifying the first decimal expansion might go as 
follows. If we write 


æ = 0.333333... 


and then multiply both sides by 10, we get 


10x = 3.3333333... 


(as multiplying by 10 moves the decimal point one place to the 
right). Subtracting the first equation from the second, we find 
9x = 3 and hence x = 3 = 3 as claimed. 


So far as it goes, the above is correct, but not the ‘whole truth’, 
until we give a precise definition of what a decimal expansion 
represents; without that, how can we be certain that the algebraic 
manipulations above are valid? 


The following infinite sum, often named after the Italian 
mathematician Guido Grandi, gives further evidence of the need 
for rigour. If we set 


y=1-14+1-14+--, 


then we might argue that 


y=(1-1)+(1-1)+(2-1)+ + =0+0+04 0 
or that 
y=1+(-14+1)+(-1+1)4 1+0+04 1 
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and so we've shown that O = 1. Without proper definitions, 
seemingly reasonable algebraic manipulations of an infinite sum 
can lead to nonsensical conclusions. 


The first calculation may not seem like an infinite sum, but the 
notation æ = 0.3333 ...is just shorthand for the infinite sum 


3. 3 3 


{ {wee 


10 ` 100 1000 ` 


We clearly need an explicit definition of what an infinite sum 
means, so we can show that æ = 3 but exclude contradictory sums 
like Grandi’s. Let’s now look at a different way of approaching the 
problem. 


How is z calculated? 


In addition to the previous algebraic methods, geometric methods 
can also be used to determine infinite sums. A square of side 1, 
and so area 1, can be divided up in two different ways (Figure 1). 
For example Figure 1(a) shows that 


(a) (b) 
i Bä 
1 1 16 
8 1 a 1 = 
i 16 16 16 
2 
i ï 1 
4 4 4 


1. Dividing up a square in two different ways (1(a), 1(b)). 


4 


bo ded, a 1 


it | 1 
2'°4°8 16 32` 
and Figure 1(b) shows that 
i, hy My cts 1 
4 16 64 256 1024 ` 3 


In Figure 1(a), the square is divided up into rectangles and squares, 
the first region having area 3, and each subsequent region being 
half the size of the previous region. The squares and rectangles 
ultimately cover the whole square, and so the infinite sum equals 1. 
In Figure 1(b), we cover the square with three collections of squares 
having areas į, ¥,gq, --., and so the sum of each collection’s areas 
is one third that of the square, namely 5. 


For now, we'll focus on approximating 7, which naturally lends 
itself to geometric methods. Recall that 7 is the ratio of a circle’s 
circumference to its diameter, and also equals the area of a circle 
with radius 1. 


Here a pentagon has been inscribed in a circle and a hexagon 
circumscribes the circle (Figure 2). As the pentagon is inside the 
circle, the pentagon has a smaller area than the circle; as the 


2. A pentagon in a circle in a hexagon. 
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hexagon surrounds the circle, the hexagon has a larger area. 
Knowing the polygons’ areas would give an underestimate and an 
overestimate for z. And by using polygons with more edges, these 
estimates would become progressively better. 


In the third century Bce, the great Greek mathematician 
Archimedes found two approximations of n using such polygons. 
The two polygons Archimedes used each had 96 edges, and he 
showed that 


10 1 
3.1408 ...=3— <T <3- = 3.1428..., 
71 7 


thus determining z to two decimal places. The approximations of 
T as 3.14 or as 35 = 2 may already be familiar. This approach was 
part of a more general method of exhaustion that Archimedes 
employed to great effect, calculating areas by using approximations 
from inside and outside with regions of known areas. 


By the 5th century, Chinese mathematicians had calculated z to 
seven decimal places using polygons with 24,576 sides. At the 
start of the 20th century, the number of known places of n was 
in the hundreds, whilst currently it is in the trillions. Modern 
approximations of z are achieved analytically using infinite sums, 
rather than geometrically. 


One of the first known expressions for 7 as an infinite sum is 


1 1 1 1 1 1l 1 
T=4x [= { + t seedy 
(G 3 5 7 9 UW 13 ) 


This was first derived by the Indian mathematician Madhava in 


the 14th century, but is also associated with James Gregory and 
Gottfried Leibniz, who each independently found the sum, albeit 
three centuries later. Quite what z has to do with the reciprocals of 
odd numbers may seem unclear at the moment; such infinite sums 


Table 1. Approximations for 7 using Madhava’s formula 


S2 = 2.666666... sioo = 3.131592... Sio000 = 3.141492... 
$3 =3.466666... Ssoo = 3.139592...  Ssoooo0 = 3.141572... 
Sio = 3.041839...  Siooo = 3.140592... S100000 = 3.141582... 
Szo = 3.121594...  Ssoo0 = 3.141392... Ss00000 = 3.141590... 


are typically evaluated using calculus, and we will explore this in 
Chapter 2. If we write s, for the sum of the first n terms of the 
sum—for example, s2 = 4 x (1 — 1/3) = 8/3 is the sum of the first 
two terms—then we generate the above approximations of 7 
(Table 1). 


The partial sums s, get ever closer to z but only very slowly. After 
half a million terms, we have only approximated 7 to six decimal 
places. Modern estimates for z use infinite sums that approximate 
zm much more quickly, such as Ramanujan’s approximation (see 
Appendix for details). Its first term, 


9801 _ _ 3 1415927300 
2206/2 peu 


is accurate to six decimal places, with each successive term giving a 
further eight decimal places of accuracy. 


Defining convergence 


A contentious question, recurring on the internet (pun intended!), 
is the following: 


does 0.99999... equal 1? 


The discussions become heated in the absence of any definitions. 
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‘Yes’ is the only reasonable one-word answer, but a full answer 
requires time to explain just what 0.99999... means. 


Arguments for the negative answer might state that the terms in 
the infinite sequence 


0.9, 0.99, 0.999, 0.9999, 0.99999, ... 


are always less than 1 and so never equal 1. This is true, but the 
notation 0.99999 .. . does not represent a sequence; it represents a 
number which is known as the limit of this sequence. However, the 
above sequence does have an important part in defining what the 
notation 0.99999 ... means. 


The individual terms of the sequence never equal 1, but 

they do become arbitrarily close to 1. That is, given any 
required degree of approximation, some term of the sequence 
gets that close to 1. For example, if we required a term that was 
within an accuracy of 0.0035 of 1, then we'd need that term to 
satisfy 


0.9965 = 1— 0.0035 < term of sequence < 1+ 0.0035 = 1.0035. 


Note that the third term, 0.999, is in this range and, in fact, every 
term afterwards is in this range as well. 


And this is essentially the definition of convergence. 

A mathematical textbook might introduce more general 
notation, and a broader setting, but a sequence of numbers 
converges to a limit if, for any required degree of accuracy, 

the terms of the sequence eventually become that accurate and 
remain so. 


Returning to Grandi’s sum, we can use our definition to show that 
it doesn’t converge. The partial sums are 


which evaluate to 1,0,1,0,1, ... Half of the partial sums are 1 and 
half are O. If we chose a required accuracy of 0.1, then a limit, 

if it existed, would have to be within 0.1 of 1, so between 0.9 

and 1.1, and the limit would also have to be within 0.1 of 0, so 
between —0.1 and 0.1. No number lies in both ranges, so Grandi’s 
sum has no limit. 


Now that we have a clear definition, these matters of 
convergence resolve straightforwardly. Well, sort of. What I’ve 
described here is the standard definition of convergence for 
sequences. However, in 1890 the Italian analyst Ernesto Cesaro 
introduced a more general notion of convergence for which 
Grandi’s sum does converge and takes a value of 1/2. Which is 
correct? Does Grandi’s sum converge or not? 


The answer is that they are both right: different definitions 

can lead to different answers, and to be clear we should 

state which definition is being used. Otherwise, it’s only 
reasonable to expect that the standard definition of convergence 
is being used. 


By way of a more extreme example, you may have seen the 
following infinite sum: 


which is commonly cited by string theorists in physics. This seems 
wholly ridiculous; the partial sums of 1, 3, 6, 10, . . . increase forever 
and are never negative. They certainly don’t converge by the 
standard definition, but rather tend to infinity. Cesaro would have 
agreed. However, in around 1913, Srinivasa Ramanujan 
introduced an approach which assigned precisely the above value 
to this infinite sum. 
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We shall focus only on the standard notion of convergence and 
consider Cesaro’s and Ramanujan’s sums too niche for further 
discussion. I include them here to make clear that context and 
choice of definition matter; they are important enough that a 
mathematical question might have different answers depending on 
how it is understood. 


Countable versus uncountable 


Potential infinities are processes that go on forever; an important 
aspect of analysis is the handling of such processes, defining clearly 
how they might resolve and how two such processes might interact. 
At this point a reasonable question would be: ‘why do we need such 
processes in the first place?’ Large parts of mathematics involve 
only finite processes. No computer has ever literally added an 
infinite number of terms together, as that would take forever. 


A potential infinity is a process like the counting of 1, 2, 3,..., 
which never ends. An actual infinity is the answer to the question 
‘how many counting numbers are there?’ During the late 19th 
century, the German mathematician Georg Cantor defined ways to 
rigorously investigate actual infinities. 


The beginnings of analysis are intimately linked with the real 
numbers. A real number is any number with a decimal expansion; 
the set of real numbers includes the counting numbers 1, 2, 3,..., 
the rational numbers (these are fractions of whole numbers such 
as — a and 2) and other irrational (= not rational) numbers such 
as v2 and zm. In 1874 Cantor showed that there are more real 
numbers than there are counting numbers. That is, it’s impossible 
to count or list all the real numbers. Any attempted listing—first 
real number, second real number, third real number, ...—would 
necessarily omit some real numbers (most of them, in fact). A proof 
is given in the Appendix. This result speaks to the nature of the real 
numbers and why analysis needs infinite processes. 
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A finite set or an infinite set which can be counted is called, 
unsurprisingly, countable; otherwise, it’s called uncountable. The 
counting numbers are countable; the integers (positive and 
negative whole numbers) are countable; perhaps more 
surprisingly, the rational numbers are countable. Yet more 
surprisingly, the computable numbers are countable. 


A computable number is one which a computer can approximate to 
any required degree. The rational numbers are all computable. 
Other numbers like z are computable; a computer could be 
programmed with either Madhava’s or Ramanujan’s sum to 
calculate r to any required accuracy. A computer program is just 
a list of commands of finite length, written in a computer language 
comprising finitely many symbols. Using Cantor’s ideas, it can 

be shown that there are countably many programs. So there are 
countably many computable numbers. 


On the other hand, there are uncountably many real numbers, 
when understood as numbers with arbitrary decimal expansions. 
This means that some (in fact, most) real numbers cannot be 
described by finite means, and so analysis needs infinite processes 
and descriptions to deal with the real numbers. 


Most mathematicians are fine with this, but for some philosophers 
and logicians this is a great concern. There are schools of thought, 
in particular that of the intwitionists, which do not accept the 
notion of arbitrary decimal expansions and so have a different 
notion of what can be validly proved in analysis, though this is not 
the broadly held view of mathematicians. Importantly, the 
uncountability of the real numbers makes clear that finite 
processes are insufficient to be able to describe them. 


Axioms and some early results 


Rigorous definitions are important to mathematicians, as we can 
employ them in rigorous proofs—carefully argued chains of logic 
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that begin with clear assumptions or previously demonstrated 
results, making clear deductions at each step until we 

reach a conclusion. In a fully understood subject, definitions 
come before theory and carefully ground the assumptions 
and logical deductions made in a proof. Historically, as 
calculus emerged, the situation was much more chicken- 
and-egg as understanding progressed, and we will see that 
rigorous definitions came quite late in the narrative. Ideally 
though, definitions are the key starting points to proving 
theorems. 


A modern introduction to real analysis would likely begin 

with some axioms of the real numbers; an axiom is an 
assumed rule that is considered self-evident, and doesn’t 

need to be proved. After all, without making some assumptions, 
nothing can be proved. For example, the commutativity of 
addition states, for any two real numbers z and y, that 


ety=yte. 


So the order of addition doesn’t matter. If we are asked to add 3 
and 4, we get a sum of 7, whether we work this out as 3 + 4 or 4 + 3. 
Other axioms relate to the ordering of the real numbers, and one 
example is the transitivity of order. This states that 


if æ < yandy < z,then z< z. 


A more subtle assumption is the completeness axiom. This states 
that 


an increasing bounded real sequence converges. 


An increasing sequence is one in which the next term is always at 
least as great as the current term, and a bounded sequence is one 
where all the terms lie between two fixed real numbers. By way of 
example, the first two sequences from 
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1, 2, 3,4, 5..., 
3, 3.1, 3.14, 3.141, 3.1415, ..., 
O, 1, 0, 1, 0,... 


are increasing; the third sequence is not increasing, as the 
third term of O is less than the second term of 1. However, 

the first sequence does not have an upper bound. The terms 
of the second sequence are the terminating decimal 
expansions of 7; these terms are bounded above by 4 and 
below by 3. The completeness axiom states that this sequence 
converges, which it does, namely to n. (Fuller details about the 
axioms appear in the Appendix.) 


Once we have agreed axioms, we can prove some first analytic 
results such as the uniqueness of limits. We showed earlier 
that the sequence 0.9, 0.99, 0.999 . . . converges to 1, but 

did not contemplate whether another real number might 

also be the limit of the sequence. In fact, this cannot arise: 

a limit, if it exists, is unique. Other early results include the 
algebra of limits. 


Here are two convergent sequences: 
0.1, 0.11, 0.111, O.1111,... 0.1, 0.18, 0.181, 0.1818, ... 


The first sequence converges to 3 and the second converges to 2. 
We can create a new sequence by adding the sequences termwise 
to get 


0.2, 0.29, 0.292, 0.2929, ... 


It turns out that this sequence converges, and moreover converges 
to 
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The sum of these two sequences converges to the sum of their 
limits, and similar results hold when we subtract, multiply, or 
divide convergent sequences. It is precisely the algebra of limits 
that makes valid the earlier argument, which showed that 
0.3333 . . . = 3. By the completeness axiom, we know that 
0.3333 ...represents some real number a, and by the algebra of 
limits we know that 


3, 3.3, 3.33, 3.333, ... converges to 10x. 


The difference of the sequences, which is the constant sequence 
3, 3, 3, 3, ... converges to the difference of the limits 
10x — æ = 9x. By the uniqueness of limits, 9x = 3 and hence 
æx=1tł 

a 
Such positive results are reassuring, but there is still room for some 
early counter-intuitive results. If we have an infinite sum of 
positive terms which converges, such as 


ee ee re ee oar 1 
4 16 64 256 1024 ` 3 


(Figure 1(b)), then however we reorder the terms, the sum 

will still converge to å. This agrees with our experience with 

finite sums; from the axioms of the real numbers it can be 
proved that a finite number of terms give the same result, whatever 
order they are added. But this need not remain true for infinite 
sums involving both positive and negative terms. Recall the 
Madhava sum: 


0.78539 m 1 1 1 1 1l 1 1 
l O A4 1 35791 13 
We can reorder the terms as 
TI Looks, 1 1 
1'5 3 9 13 i 


so that the positive terms are summed two at a time compared with 
the negative terms. Note that all the terms are still present, and none 
duplicated; the second sum has all the same terms as the first, just in 
a new order. However, this time the infinite sum can be shown to 
converge to 0.95868... which is greater. (For those with knowledge 
of logarithms, the exact sum is (a + log 2)/4.) In fact, the German 
mathematician Bernhard Riemann showed in 1853 that these terms 
can be rearranged to sum to any value, finite or infinite. If we 
wanted, say, a sum of 100, then we would carefully need to 
front-load the rearrangement with positive terms to manage this, 
but we could achieve this, as, importantly, there are infinitely many 
positive terms, and the sum of those positive terms is infinite. 


This may or may not be strikingly counter-intuitive to you, but 

I hope you agree that some people would find it so. Without careful 
definitions, without careful proofs, it is impossible to convince 
others when intuition fails. 


Modern analysis 


Analysis itself arose as a separate subject within mathematics 
around the 19th century. We shall see, though, that the terms 
analysis and analytic occurred much earlier, particularly in the 
17th century with the work of Fermat and Descartes on ‘analytic 
geometry —that is, co-ordinate geometry. 


The 19th century was when modern analytic treatments became 
recognizable, such as so-called ¢-5 (read: ‘epsilon-delta’) proofs 
(Chapter 2). The definition of convergence given earlier is 
commonly attributed to the German analyst Karl Weierstrass, who 
was lecturing in Berlin on such material in 1861. Weierstrass is 
often considered the father of modern analysis and remembered 
for introducing e-5 arguments. Much earlier, though, in 1817, 
Bernard Bolzano had made the same definition and proved several 
important theorems of analysis, but his work did not receive due 
attention for another 50 years. Augustin-Louis Cauchy also makes 
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use of -ô proofs in his influential Cours d'Analyse of 1821. So, as 
with much mathematics, a rigorous treatment of real analysis arose 
over decades and from the contributions of many. 


A modern undergraduate mathematics course on calculus 
(Chapter 2) includes limits as a central, foundational concept. But 
by the 19th century calculus was almost two centuries old; it had 
been widely applied and studied without a formal definition of 
limit, though that is not to say it had been without its critics. 
Increasingly, a rigorous approach to analysis, as well as a 
broadening in the notion of a function, was becoming necessary, 
especially in the treatment of Fourier series (Chapter 6). A growing 
appreciation of analytic matters would lead to more than just a 
firmer understanding of old results. As we shall see in later 
chapters, analysis would find a canon all of its own, and the theory 
and methods of analysis would have impact across mathematics 
and find applications in much of science. 
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Chapter 2 

All change . . . the calculus 
of Fermat, Newton, and 
Leibniz 


Calculus 


The invention of calculus is traditionally credited to Newton and 
Leibniz in the late 17th century, though their progress was very 
much built on the work of others. Given the sheer range of 
applications it has found in mathematics and the physical sciences, 
calculus can arguably be described as humankind’s single greatest 
invention from the last 500 years. It has, as the mathematical 
historian C. H. Edwards writes, ‘served for three centuries as the 
principal quantitative language of Western science’. Calculus most 
naturally falls into two branches: differential calculus—the study 
of rates of change—and integral calculus—the study of 
accumulated changes. These two processes, differentiation and 
integration, are essentially inverses of one another, a fact made 
explicit in the fundamental theorem of calculus. 


Some of the questions calculus addresses date to the ancient 
Greeks, but much of the early focus of calculus related to 
contemporary scientific problems, particularly physical and 
astronomical ones. Within geometry, calculus can determine 
tangent lines, areas, and volumes; beyond pure mathematics, and 
particularly via differential equations, calculus would help model 
and understand much of the world around us. As they evolved, the 
approaches to calculus would take on the flavour of the new 
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algebra and new co-ordinate geometry, much more than that of 
ancient Greek geometry. We shall see that calculus involves the 
sorts of limit processes that are now naturally thought of as part of 
modern analysis. The rise of calculus and its widespread 
applications would only add to the urgency of developing a 
rigorous grounding for the subject. Equally, the breadth of 
application of calculus was just a precursor to the subsequent 
impact of modern analysis across mathematics and science. 


To better appreciate how and why calculus developed, we first need 
to review the important advances of the previous century. 


The 17th century 


The 17th century was transformative for European mathematics. 
Between the ancient Greek era and 1600 there had been important 
advances: the Hindu-Arabic number system had been introduced, 
promoted by Fibonacci in his Liber Abaci of 1202; cubic (degree 3) 
and quartic (degree 4) polynomial equations had been solved by 
Italian mathematicians in the 16th century; François Viète had 
made important improvements in algebraic notation. But as of 
1600 the canon of European mathematics was mostly ancient 
Greek in both content and emphasis. The advances made by the 
Kerala school in India, for example in the use of power series, were 
unknown to European mathematicians of the time. 


17th-century European mathematics would advance in various 
ways: in 1614 John Napier introduced logarithms (Chapter 3); 
projective geometry developed from the study of perspective in art; 
and, in a correspondence of 1654, Pierre de Fermat and Blaise 
Pascal laid down many of the fundamentals of probability. 


But as far as the development of calculus is concerned, the two 
important advances of the 17th century were co-ordinate geometry, 
also known as analytic geometry, and the concept of function. 
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Synthetic versus analytic geometry 


The word ‘synthesis’ means a process of thought leading from 
cause to effects; in this sense the geometry of Euclid is synthetic, 
with theory carefully being deduced from assumed axioms. This 
contrasts with ‘analysis’, which means a process of thought leading 
from effects to cause—or at least this is the word’s original 
etymology. In the word ‘psychoanalysis’ we can see that the term 
has retained its roots—underlying disorders are diagnosed from 
visible symptoms and behaviours—but we shall see that the 
meaning of ‘analysis’ in mathematics has evolved considerably 
since the 17th century. 


Back then, the phrase ‘analytic geometry’ referred to co-ordinate 
geometry. The methods of analytic geometry introduce 
undetermined quantities, such as x and y, and the geometric 
constraints on these unknown quantities manifest as equations 
involving x and y which are to be solved. At that time, the term 
analysis was largely synonymous with such use of algebra and 
equations. 


By way of contrast, we give three proofs of Thales’ theorem, which 
states that the angle made by a diameter in a semi-circle is a right 
angle. They are included to show how different the language of the 
mathematics is, so don’t be concerned if some of the reasoning is 
unfamiliar. 


Each proof begins with a circle, centre O and diameter AB. The 
third point C of a triangle ABC lies on the circle. We wish to prove 
that the angle ZACB is a right angle (Figure 3). 


PROOF 1: Draw in the line OC. As OA and OC are radii, the 
triangle AOC is isosceles and so the base angles ZOAC and ZOCA 
are equal. Likewise, angles ZOBC and OCB are equal. Then 
220CA + 2OCB equals two right angles, as they make up all the 
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Proof 1 Proof 2 Proof 3 


3. Three proofs of Thales’ theorem (3(a), 3(b), 3(c)). 


angles of the triangle ACB. So finally ZACB = 2OCA + ZOCB 
equals one right angle. 


PROOF 2: Choose co-ordinates so that O = (0,0), A = (—1, 0), 
B = (1,0), C = (a, y); by Pythagoras’ theorem, the circle’s 
equation is then x? + y? = 1. The gradient of CB equals -+ and 
the gradient of CA equals #4. The product of these gradients 
equals 


and so the lines CB and CA are at right angles. 


PROOF 3: With co-ordinates chosen as in Proof 2, consider the 
vectors AC = (a+ 1, y) and BC = (x — 1, y). Their scalar 
product equals 


AC: BC (@-1)(w@+1)+y? =2?-1+y =0. 


Hence AC and BC are perpendicular. 


The first proof is entirely synthetic; it relies on previous results, 
such as the base angles of isosceles triangles being equal and the 
angles of a triangle adding up to two right angles. No mention of 
co-ordinates is made, the entire problem being set in a featureless 
Euclidean plane. 
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The second proof is analytic. It introduces co-ordinates into the 
Euclidean plane and does so, without any loss of generality, to 
make the subsequent algebra as palatable as possible. We place the 
origin on the circle’s centre, align the x-axis with the diameter, and 
take the circle’s radius as the unit length. The general point C is 
assigned co-ordinates (a, y), and all we know is that x? + y? = 1, 
this being the equation of the circle. The proof relies on the fact 
that two lines are perpendicular if the product of their gradients 
equals —1. Note that the concept of gradient doesn’t even make 
sense in synthetic geometry; it’s a notion we can only assign to lines 
once we have introduced co-ordinates. This second proof is closest 
in style to a 17th-century analytic proof of Thales’ theorem. 


The third proof has a more modern style which would not have 
appeared until the 20th century, though it is essentially the same as 
the second proof. It makes use of vectors and the scalar product. 


Analytic geometry and the function concept 


Analytic, or co-ordinate, geometry was independently introduced 
by René Descartes and Fermat; Descartes’ work was published in 
1637, but Fermat’s only appeared posthumously in 1679. Instead of 
a featureless Euclidean plane, the Cartesian plane—named in 
honour of Descartes—has two perpendicular axes, the horizontal 
a-axis and the vertical y-axis, meeting at the origin. Every point of 
the Cartesian plane can be uniquely assigned x- and y-co-ordinates, 
depending on how far along the æ- and y-axes the point is 

(Figure 4(a)). Note that points to the left of the y-axis have a negative 
a-co-ordinate, so the co-ordinates are displacements from the axes 
rather than simply distances. (Distances cannot be negative.) 


Given the advances the ancient Greeks made in geometry, it is 
surprising that they made only rudimentary use of co-ordinates, 
with the arguable exception of Apollonius of Perga, but even he 
made no use of negative numbers and used co-ordinates to 
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(a) y (b) 


(1.5, -1.2) 


4(a). The Cartesian plane. 4(b). The cissoid of Diocles. 


investigate curves that were initially characterized by their 
geometry. 


Let’s take the cissoid of Diocles as an example (Figure 4(b)). 

A circle of radius a has base point O which is diametrically opposite 
the point A. Given a point R on the tangent line to A, the line OR 
intersects the circle at Q, and a third point P on the line OR is such 
that the distances OP and QR are equal. The cissoid is the curve 
traced by P as R moves along the tangent line. 


The ancient Greeks investigated curves, like the cissoid, using such 
geometric constructions. But if we take O to be the origin and 
position the y-axis along OA, then the cissoid has the equation 


(2? + y")y = 20a”, 


meaning a point (x, y) lies on the cissoid precisely when «æ and y 
satisfy the above equation. (Details appear in the Appendix.) 


Graphs are now so ubiquitous, representing data or some recent 
trend in the news, that it is hard for us to appreciate the 
revolutionary impact that the introduction of co-ordinates had on 
mathematics. But looking at the first and second proofs of 
Thales’ theorem reviewed earlier, the different emphases are 
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marked—one is geometric, one essentially algebraic, each making 
use of starkly different methods. Also, the second proof implicitly 
uses advances that might not be clear—an ease with using negative 
numbers and much improved algebraic notation. Importantly, 
there was no longer any primacy of geometry over algebra; the 
cissoid, for example, can be as readily described by the above 
equation as by its geometric construction. Further, with the 
introduction of co-ordinates, it’s natural to think of points on the 
cissoid as the graph of some function y(x) of a variable x. 


It was Leibniz who first coined the term ‘function’ in 1673. The 
concept of a function barely existed prior to the 17th century, and 
its development would continue into the 20th century. With 

the advent of analytic geometry, it became natural to think of the 
y-co-ordinate (or ‘dependent variable’) of a point on a curve as 

a function of the x-co-ordinate (or ‘independent variable’). 

By modern standards, the description that y might depend on x 
‘in an algebraic or transcendental manner’ (to quote the Swiss 
mathematician Johann Bernoulli around 1697) is primitive. The 
development of the function concept and that of analysis would be 
intimately connected over the next three centuries, with the need 
for a rigorous and broad notion of a function often driving 
advances in analysis or vice versa. 


Pierre de Fermat 


The French mathematician Fermat (Figure 5(a)) is often termed the 
‘prince of amateurs’, as he was actually a lawyer and parlementaire 
in Toulouse by profession. He made significant contributions 
across much of mathematics and physics: he independently 
introduced Cartesian co-ordinates, including in three dimensions; 
in optics the principle that light takes the least time to travel 
between two points is due to him; and he made many advances in 
number theory, including proving that every prime number which 
is one more than a multiple of four can be written as a sum of two 
squares (e.g. 17 = 1? + 4°, 653 = 13” + 227). Fermat, more than 
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5(a). Pierre de Fermat. 5(b). Approximating the gradient. 


any of his contemporaries, showed the power of algebraic, analytic 
methods when applied to old and new problems; to quote Michael 
Sean Mahoney from his biography of Fermat, ‘in a very real sense, 
Fermat presided over the death of the classical Greek tradition in 
mathematics’. 


In the development of calculus, Fermat’s most important work 
was his 1636 Method for Determining Maxima and Minima 

and Tangents to Curved Lines. Consider the graph of a function 
y = f(x) and the tangent line L at a point P = (a,f(a)) 

(Figure 5(b)). This tangent line just touches the graph, having the 
same gradient as the graph at the point of intersection P. 


In order to determine that gradient, Fermat considered a nearby 
point Q = (a+h,f(a+h)) on the graph. The gradient of the 
chord PQ is the change in the y-co-ordinate divided by the change 
in the z-co-ordinate, namely: 


f(a+h) -f(@) _fla+h) -fla 
(a+h)-a h i 


gradient of chord PQ 


If h is a small non-zero real number, it’s reasonable to think that 
the gradient of PQ will be a good approximation to the tangent’s 


24 


gradient. Note, importantly, we cannot just set h to be zero, as the 
above fraction would become $, which is meaningless. We want to 
say that ‘h should become as small as possible’; simple as that 
phrase seems, it would take mathematicians two centuries to work 
out quite what they ought to be saying. Fermat himself was never 
explicit on this matter, introducing a notion of ‘adequality’ to 
describe this process, and historians of mathematics continue to 
discuss quite what he intended by this. 


Let’s consider a specific choice of function, f (x) = x. The 
previous fraction becomes: 


(ath)? -—a a?+2ah+h?-a? 2ah+h? 


h h i 2a +h. 


Here it seems clear that we get the answer of 2a, as ‘h becomes 
small’; indeed, it’s now valid to set A to equal zero to obtain 

that answer. And in this case 2a is the correct gradient of the graph 
y =x” at the point (a, a’). 


More generally, Fermat showed that the graph y =a” has 
gradient na”~! when x =a. Take special note of this result, as we 
will repeatedly use it during the rest of the text. 


What might not be apparent is that this calculation works out 
nicely because x” is a polynomial when n > O—that is, a function of 
the form 


F(a) = co + ag + Cou? + + + cpar*, 


where k is a non-negative whole number and co, ci, Co, ..., Cg are real 
numbers. These functions are sufficiently nice that, via some algebraic 
manipulation, Fermat was left in a position where he could just set A 
to equal zero. The curves of interest to Fermat were defined by 
polynomial equations in the co-ordinates x and y, so he never needed 
to consider how h tended to zero or introduce the notion of limit. 
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(a) (b) 


0.4 


minimum 


6(a). Gradients at extrema. 6(b). Area under y= x”. 


Fermat also considered where functions have a maximum or 
minimum (Figure 6(a)). At such points the gradient is zero—or 
equivalently the tangent line is horizontal. This result is referred to 
as Fermat’s theorem. Such points are called stationary points. 


In his Treatise on Quadrature of around 1658, Fermat was also 
able to calculate areas under the graphs of polynomials. He showed 
that the shaded area (Figure 6(b)) under the graph y = 2”, lying 
above the x-axis and between the lines 2 = O and œ = a, equals 


art! 


, 


n+1 


for any rational n not equal to —1. This area can be approximated 
by the total area of a collection of rectangles (Figure 6(b)); as the 
width of these rectangles becomes small, their total area gets ever 
closer to that under the graph. Here, again, we have a limit 
process being implemented, albeit in the absence of rigorous 
definitions at the time. Note that the above formula is nonsensical 
when n = —1, as the denominator is zero; we shall address this 
special case in Chapter 3 when we discuss logarithms. 


Many of the key ideas of calculus appear in Fermat’s work, 
which was fundamental to later developments. But there are 
important absences as well, so he is not usually credited as a 
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founder of calculus alongside Newton and Leibniz. Fermat was 
working with algebraic functions which could be manipulated to a 
point where it was not necessary to take a limit. More than a century 
would pass before a rigorous notion of limit was understood. 


Fermat was content to apply his methods to specific problems. He 
never gave a name to the gradient function of the graph of y = f (æ) 
(as 2x is to x”), which would now be referred to as its derivative 
and denoted by f’(x). The process of determining the derivative is 
called differentiation and the process of determining the area 
under a graph is called integration; the ‘fundamental theorem’ 
connecting these processes would need to wait until the next 
generation of mathematicians. 


The fundamental theorem of calculus 


Given a function f(x), with a defined gradient f'(x) everywhere, 
J’ (a) is itself a function called its derivative and f (x) is said to be 
differentiable. Visually, we can think of f'(x) as the gradient of 
the graph y = f (x) at the point (x, f (x)), but it will also be useful to 
think of f'(x) as a measure of how quickly f(z) is varying as x 
increases. For example, if f’(x) is positive then f(a) increases with 
x and if f'(x) is negative then f(x) is decreasing. The process 
associating f'(x) with f (x) is called differentiation. Any function 
F(x) such that F(x) = f (a) is called an antiderivative of f (x). 


Firstly, note that not all functions are differentiable. Consider the 
graph of the function 


æ if x>0, 
Fla) = |x| = ia if x <0, 


called the modulus function (Figure 7). This function has 

a well-defined gradient of 1 for x > 0 and —1 for æ < 0, but doesn’t 
have a defined gradient at x = O. Informally, this is because the 
graph has a corner at x = O or because the function changes 
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7. The modulus function. 


jerkily there. More formally, the gradient of the chord between 
(0, O) and (A, |A|) equals 


S(O +A) —f(0) _|Al_ f1 if h>o, 
h h \-1 ifh<o, 


so that, as h becomes small, we do not get a single value for the 
derivative. There are yet more pathological functions that, despite 
being continuous, do not have a well-defined gradient at any point. 


In this terminology, Fermat had shown that the derivative of 

S (a) = 2” isf (x) = nx", so that an antiderivative of f(a) = x” 
is F(x) = a I write ‘an’ antiderivative of f (x) because, for any 
constant c, the function F(a) + ¢ is also an antiderivative of f(x). 


This is because adding a constant c just moves the graph of F(a) up 
or down and so doesn’t alter the gradient. 


You may note that the expression for F (æ) looks remarkably like 
the formula , which equals the area under the graph of f (æ) 


lying between x = 0 and x = a (Figure 6(b)). This is a first instance 
of the fundamental theorem of calculus, which states that if F(z) 
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8. Signed area. 


is an antiderivative of f(x), then the area under the graph of 
y =f (x), above the x-axis and between x = a and æ = b, equals 


F(b) — F(a). 


Note that we still arrive at the same answer if we instead use the 
antiderivative F(a) + c, as the two c terms cancel out. 


In fact, ‘signed area’ would be a better description of what F(b) — F(a) 
represents. Area is always positive, but when the graph of f (æ) is 
below the x-axis, the area between the graph and the z-axis 
contributes negatively to F(b) — F(a). For example, we see above 
(Figure 8) a graph of y = f (x) = x? — æ which has antiderivative 


at æ 
4 2’ 


so that F(1) — F(—1) = 0; this does not represent the shaded area 
(Figure 8). Rather it represents the signed area A; — Ag, as the area 
Ay is below the z-axis and so counts negatively to the total signed 
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area; aS A; = Ao, the total signed area is zero. To calculate 
the shaded area as a genuine area, we instead need to determine 
Ay + Ao. Now 


so that the total area Aı + A» equals } + } = 3. The standard 
mathematical notation for the signed area under the graph 
y =f (x) and within the interval a < æ < bis 


[Fæ ae 


a 


so that the fundamental theorem of calculus states that 


b 
| f(a) de =F) - Fla), 


a 


where F(x) is an antiderivative of f(a). The symbol f is 
called an integral sign and is an elongated ‘s’, standing for 
‘sum’ (or ‘summa in Latin), and the expression on the 
left-hand side would be referred to as an integral. More 
precisely, it is referred to as a definite integral, as it has 
limits a and b. An indefinite integral is synonymous with an 
antiderivative. 


The connection between integration and area is important to the 
applications of calculus, but the fundamental theorem is most 
easily understood in terms of rates of change. Here dæ represents 
an infinitesimal increase in x so that f(x)dx = F’(x)dax is how 
much F(z) has increased (or decreased) during the same interval. 
The integral (Pr "(a) dæ is the sum of all these infinitesimal 
changes in F(a) and so equals the total change in F(x) as æ varies 
from a to b, which is just F(b) — F(a). 
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9(a). Infinitesimal rectangles. 9(b). Fundamental theorem of calculus. 


In terms of area, f(x)dz is the area of an infinitesimal rectangle 
of width dx and height f(a) which lies under the graph of f (æ) 
(Figure 9(a)). At least, this is the case if f (x) is positive; if f (æ) 
is negative, then f()dz is the signed area of an infinitesimal 
rectangle lying under the z-axis and above the graph. Here we 
see a very informal sketch of why the fundamental theorem is 
true (Figure 9(b)). 


Consider the signed area G(b) under the graph of f(x) between 


æ = a and g = b, thinking of a as fixed but b as varying. In the 
above notation 


If we increase b by a very small amount A, then the area increases 
by exactly G(b + h) — G(b). But this area is also approximately 
f(b)h, the area of the grey rectangle (Figure 9(b)). So 


G(b + h) — G(b) ~ f (b)h, 
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where the symbol ~ denotes ‘approximately equals’; the two 
sides of the above ‘equation’ only differ by the area of the 
small white triangle above the rectangle, which is relatively 
negligible. Hence 


G(b+ 2) -GO SFD, 


and as we decrease A to O we find in the limit that G'(b) = f (b). 
This means G(x) is an antiderivative of f (x). Now G(a) = 0 by 
definition—there is zero area between x = a and x = a—so finally 


b 
| fla) ae G(b) — G(a) = F(b) — F(a), 


a 


where F(x) is any antiderivative of f(x); as commented earlier, 
this difference is the same for any choice of antiderivative. 


A first rudimentary version of the fundamental theorem of calculus 
was proven using geometric methods by the Scottish 
mathematician James Gregory in 1668. In fact, in his unpublished 
work, Gregory had developed many crucial ideas of Newton and 
Leibniz, but his tragically premature death, aged 36, means that he 
is not widely remembered for his contributions. His work would 
not become more generally known until a memorial volume 
marked the tercentenary of his birth. 


Significant though Fermat’s contributions were, it’s clearer now 
what was missing from his work. He never gave a name to the 
process of differentiation nor appreciated the inverse nature of 
differentiation and integration. Following Fermat, there was a 
coherence of understanding among mathematicians of these 
processes; differentiation was shown to have nice algebraic 
properties (see Appendix for more details); important 
improvements in notation were introduced; there were wide 
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applications of calculus. But we will also see that 18th-century 
calculus still lacked rigour in many important aspects. 


The calculus of Newton and Leibniz 


Isaac Newton was one of the greatest and most influential 
mathematicians and scientists throughout history, and a 
significant figure during the Enlightenment. Besides being 
remembered, alongside Leibniz, as one of the developers of 
calculus, he made seminal contributions to the study of classical 
mechanics, gravity, and optics. Much of his work involved the 
application of calculus to real-world problems—this is evident in 
his three laws of motion and his law of gravitation. Gottfried 
Leibniz, by contrast, was a mathematician and philosopher, and 
was interested in producing a coherent treatment of the new 
calculus with well-chosen and suggestive notation, as well as the 
mathematical results themselves. 


The previously used notation f’(æ) for the derivative was actually 
introduced by Joseph-Louis Lagrange in 1797. Leibniz began using 
his notation for integrals and derivatives in 1675. We have already 
seen his notation for the integral 


[ræ de. 


Ja 


Leibniz envisaged dz as an infinitesimal increment in 2 so that the 
above is a sum of signed areas of infinitesimal rectangles. In a 
similar manner, if we write 


Ay=f(a+h)—f(x), Ax=h 


for the changes in y and x (Figure 5(b)), then A y/A x is the gradient 
of PQ, which, in the limit as Q approaches P, Leibniz wrote as 
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dy 
Ha 


This is read as ‘d y by da’. Note that dy/dx shouldn't be considered 
as a fraction—it is a measure of how y changes as x changes, and 
has no meaning as a fraction of quantities ‘dy’ and ‘da’. 


Newton’s contrasting approach shows his roots as an applied 
mathematician. He considered a point P = (a(t), y(t)) varying on 
a curve with time t. He denoted the horizontal and vertical 
velocities of P as # and y so that 


. dex __ dy 
PSE and J= 


in Leibniz’s notation, and then instead defined 


dy 4 
de @& 


referring to and y as fluxions. 


By modern standards, none of the above is satisfactorily rigorous. 
Leibniz is still using infinitesimals and referring to the ‘infinitely 
small’ and Newton is using undefined ‘fluxions’. Neither had a 
rigorous sense of what a limit means, and if dx is to be understood 
as the limit of Az as it approaches zero, then dz is 
indistinguishable from zero and dy/dz is indistinguishable from 
0/0, which is meaningless. 


Thus—and despite the many applications of calculus—there were 
critics of calculus’ logical foundations, and these issues would 
remain into the 19th century. The most trenchant criticism would 
come from Bishop Berkeley in his 1734 work The Analyst. He 
clearly felt adherents of the calculus were trying to have their cake 
and eat it, writing of infinitesimals: 
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They are neither finite quantities, nor quantities infinitely small, nor 


yet nothing. 


May we not call them the ghosts of departed quantities? 


The quote that calculus was ‘a collection of ingenious fallacies’ has 
been ascribed to the French mathematician Michel Rolle. 
Whether or not Rolle actually said this, it certainly captures his 
concerns for the foundations of calculus. 


Berkeley’s and Rolle’s criticisms were important ones, not least 
because of the wide success of the applications of calculus. Raising 
such questions did not detract from the impact of the work of 
Newton, Leibniz, and others, but did highlight the need for 
mathematics to get its house in order. 


Newton’s physics 


Much of Newton’s interest in calculus was due to its applications to 
current scientific problems. Calculus naturally finds applications 
in physics, as some derivatives have physical relevance. If a 
particle moves along the real number line so that at time t its 
position is x(t), then the derivative 


dx 

— or v(t 

di (t) 

is its velocity. Note this velocity may be positive—when the particle 
moves left to right—or may be negative—when the particle moves 
right to left—or zero when the particle is stationary; by contrast, 
speed is the magnitude of velocity and cannot be negative. 


The derivative of velocity is acceleration, which is denoted by 


dx 


qe % æ" (t). 
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10. Kepler’s first two laws. 


Acceleration can be positive, zero, or negative; if it is always zero, 
then the particle has constant velocity; if x is increasing, then 
acceleration is positive/negative when the particle is speeding up/ 
slowing down. Acceleration has importance in modelling physical 
systems because of Newton's second law of motion, which states 
that the force acting on a particle is equal to the particle’s mass 
multiplied by its acceleration. Many physical applications begin 
with the second law; its application leads to information about a 
second derivative in terms of a system’s current status, giving one 
or more differential equations (Chapter 3). 


Between 1609 and 1619 Johannes Kepler stated three laws of 
planetary motion—these were laws Kepler had produced based on 
observed astronomical data. The laws, two of which are depicted 
in Figure 10, state: 


e A planet E orbits the Sun S in an ellipse with the sun at one 
focus of the ellipse. 


e Over a given time interval, the shaded area swept out by the 
line connecting the Sun and planet is always the same. 


e The square of a planet’s year is proportional to the cube of the 
orbit’s major axis. The constant of proportionality is the same 
for each planet in the solar system. 


Using his second law of motion and his law of gravity—that the 
force between two particles is proportional to each mass and 
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inversely proportional to the square of the distance between 
them—Newton could mathematically prove all of Kepler’s three 
laws. The details of deriving Kepler’s laws mathematically are 
beyond this text, but applying Newton’s law of gravity leads to a 
differential equation; solving this differential equation gives a 
function describing the orbit of E£, which can be recognized as an 
ellipse. Then both the year of £’s orbit and the shaded area above 
(Figure 10) can be represented by integrals involving the orbit’s 
function. 


The Newton-Leibniz controversy 


Newton’s initial work on calculus dates to 1664-6, but he would 
not publish on fluxions until 1704 in an appendix to his Optiks. His 
Principia (fully, Philosophiae Naturalis Principia Mathematica) 
of 1687 is primarily concerned with his laws of motion and gravity, 
though the work extensively uses arguments of calculus presented 
in a geometrical format. Leibniz, by contrast, did much of his work 
later than Newton, in 1672-6, but published his articles first, in 
1684 and 1686. So there remained the question of which of the two 
could be credited with the invention of calculus. 


The issue was somewhat muddied because in 1669 Newton had 
shared his work De Analysi with a limited number of people, some 
of whom Leibniz visited in London in 1672 and in Paris in 1673. 
This, of course, raises the question of whether Leibniz had become 
aware of the details of Newton’s work on his visits. 


The two mathematicians themselves cannot be blamed for the 
controversy’s initial development, which was started around 1700 
by other parties accusing Leibniz of plagiarizing Newton, or vice 
versa. The situation worsened when, in 1712, the Royal Society of 
London published a collection of the allegations against Leibniz; 
the president of the Society at the time was Isaac Newton! In 1713 
the Society pronounced on the dispute, unsurprisingly finding in 
favour of Newton. 
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The modern consensus is that the two mathematicians developed 
the calculus independently. Ultimately, the significance of the 
controversy was not who was deemed to have priority at the time, 
but the polarizing effect the controversy had between English and 
continental mathematicians. English mathematics effectively cut 
itself off from mainstream continental mathematics, pure 
mathematics especially, and this situation would not be wholly 
remedied until the start of the 20th century. Further, because 
Newton’s methods had been largely geometric, it was on the 
continent that analytic methods would progress. 
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Chapter 3 
To the limit: analysis in the 
18th and 19th centuries 


e and the exponential function 


Briefly digressing, consider the following scenario. You have 
invested a sum S of money with a banker at an annual interest rate 
of x, so that after one year you have earned xS and have in total 
(1+ x)S. You realize that if you could convince the banker to offer 
half the interest rate x/2 twice a year, you would have (1+ 2) 78; 
which is greater—this is because 


Indeed, you can improve further by having an interest rate of x/n 
paid n times a year so that your money grows by 


+3)" 


which is yet bigger and keeps getting bigger as n increases. It turns 
out, however gullible your banker is, that the above product 
reaches a limit as n becomes large. 


The sequence grows as 7 increases but remains bounded. This 
means the sequence has a limit, though the limits are different for 
different choices of æ. This ‘continuous compounding’ was first 
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investigated by Thomas Harriot around 1620, and the limit defines 
the exponential function: 


ee avr 
exp(x) = limit of (a + 3 as n becomes large. 


When x = 1 the value exp(1) is denoted as e. After 7, e is the second 
most important constant in all of mathematics. 


As a value, e = 2.718281828459045...and has been calculated to 
over 10"? places. The great Swiss mathematician Leonhard Euler 
showed in 1737 that e is irrational—that is, e is not the ratio of two 
integers (see Appendix for a proof)—and in 1873 Charles Hermite 
proved that e is transcendental—meaning that e is not the solution 
of any polynomial equation with whole-number coefficients. The 
notation e was introduced by Euler and e is often referred to as 
Euler’s number, because of his clarifying work on the exponential 
function, despite the earlier studies of Harriot (and also Jacob 
Bernoulli). 


A graph of the exponential function can be seen below (Figure 11(a)). 


The phrase ‘exponential’ is figuratively used to describe rapid 
growth; in a technical sense growth is exponential if, over a given 
period, the same relative growth occurs. The same is true of 


(a) (b) 


J» = expla) we 


J 


“7 y= loga) 


15 _ 
: i 
y= explo) _ 9 2 4 6 
oaea 0.5 
-1.0 -0.5 | 0.5 1.0 
-0.5 æ = logy) 


11(a). Graph of y = exp(x). 11(b). Graphs of y = exp(x) and y = log(x). 
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exponential decay; for example, the half-life of a radioactive 
element is the time taken for half the atoms to decay into other 
elements, and this time is constant irrespective of the amount of 
material present. Consequently, the exponential function satisfies 
the identity 


exp(# + y) = exp(2) exp(y) 


for any real numbers x and y. Note that this means for a counting 
number n that 


exp(n) = exp(n — 1) exp(1) 
= exp(n — 2) exp(1)” 


exp(1)” 
= e”. 


However, at this stage, we cannot simply write exp(æ) = e” 

for a general real number x. The notion that e” means ‘e times 
by itself x times’ is nonsensical when, say, x = v2. But we 

will see we can use the exponential function to define general 
powers. 


Logarithms and powers 


It makes sense to define a” as ‘a multiplied by itself n times’ only 
when 7 is a positive whole number. When a is positive, we can 
make sense of rational powers such as a?/? as the cube root of a?, 
but it remains unclear how a general power a” might be defined. 
Though it seems reasonable to still expect 


ary = a” a” 


for any real numbers g and y. In order to define arbitrary powers, 
we need to introduce logarithms. 
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From the graph in Figure 11(a), we see the exponential function 
attains all positive values y. Moreover, as exp(2) is increasing, there 
is a unique value x such that 


exp(x) = y. 


This value z is the logarithm (or natural logarithm) of y, 
and we write x = log(y), a common alternative notation 
being x = In(y). Importantly, the logarithm function has the 


property 
log(ay) = log(x) + log(y) 


for positive x, y; this identity corresponds to the previous identity 
for exp(x + y) (see Appendix). 


For positive a, we can now define general powers in terms of this 
logarithm function as 


a” = exp(a log(a)). 


This gives the desired earlier algebraic property as 


aa” = exp(a log(a)) exp(y log(a)) 
= exp(x log(a) + y log(a)) 
= exp((x + y)log(a)) = a". 
Note that a” agrees with our previous notion of a” equalling a 
multiplied by itself x times when g is a counting number. 


When a = e, so that log(e) = 1, then e” = exp(z), and so we will 
write e” rather than exp(x) from now on. The function a” defines a 
differentiable function of x which has derivative a*log(a) 

(see Appendix). So when a = e, then e” equals its own derivative—an 
important property of the exponential function and one we will 
return to. The derivative of log(x) equals 1. Note this is the 
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antiderivative we have been missing for the powers x”. The 
n+l 


ar 
n = —1 the antiderivative is log(z). 


antiderivative of x” is 


when 7 does not equal —1, and in the case 


Before modern calculators and computers, logarithms were 

an important part of how products were determined 

because sums are much easier to calculate than products. 
Logarithms were first introduced in 1614 by the Scottish 
mathematician John Napier. His somewhat different definition 
for Nlog(x), the Napierian logarithm of positive x, is the value 


of y satisfying 
1 y 
a=107(1-—~] , 
107 


which satisfies the identity 


XIla 
107 


Nlog( ) = Nlog(a,) + Nlog(a.). 


Between 1617 and 1624 Napier’s tables were improved by 
Henry Briggs, who introduced common logarithms or 
logarithms to base 10, better suited to the decimal system. 

The common logarithm y for positive x satisfies x = 10” and is 
written log, x. 


Tables of logarithms reduced difficult products to simpler sums. 
For example, suppose we wished to calculate 


230.1367 x 1213.9743. 


A table of common logarithms need only contain the logarithms 
for numbers between 1 and 10 as 


log,5230.1367 = log, (10? x 2.301367) = 2 + log,)2.301367, 
log,91213.9743 = logo (10°? x 1.2139743) = 3 + log,91.2139743. 
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On looking up these logarithms, we would find 
log,) 2.301367 = 0.361985881..., log, 1.2139743 = 0.084209483.... 


(It may be that the tables only provide logarithms for inputs given 
to fewer decimal places, but then it is possible to interpolate to find 
logarithms for values between those given in the tables.) Then 
log,) (230.1367 x 1213.9743) equals 


2.361985881... + 3.084209483... = 5.446195374..., 


an addition which is simple compared with the earlier 
multiplication. Returning to our tables, we would find that 
0.446195374 is the common logarithm of 2.793800392, and hence 


230.1367 x 1213.9743 = 10° x 2.793800392 = 279380.0392 


to four decimal places. The precise answer is found quickly with 
a modern calculator to be 279380.03928681, but it is easy to 
forget how recent an invention electronic calculators and 
computers are. Logarithm tables (and slide rules, which make 
use of logarithmic scales) were widely used until the early 1970s. 


Power series and Taylor series 


Power series are infinite sums of the form 


They provide a very powerful tool in analysis, as demonstrated by 
Newton in his De Analysi and later by Euler and Lagrange. 


The real numbers co, ¢1, c2, ... are considered fixed (for this power 
series), whilst the real number « is considered an input which 
may vary. Depending on the value of x, the above sum may 
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converge or not. When x = 0 we find that f (0) = co, and so the 
sum definitely converges. But for x = 1, say, we see 


Sf = Co + 6 + C2 + POD 


which may or may not converge. As an example, consider the 
power series 


where cn = 1 for each n. When x = 1 we can see that f(1) =1+1+ 
1+ --- doesn’t converge. When œ = —1 we obtain Grandi’s sum 
J(-1) =1-14+1-1+---, which also doesn’t converge. In fact, 


it can be shown that f(x) converges precisely when —1 <æ <1. For 
such x we can validly argue by the algebra of limits that: 


1 
J (a) = Ix 
Hence 
5 fae: if —1l<a<1, 
Sf(@) =1+24 undefined for other z. 


Note the function 1/(1 — æ) is defined more generally, namely 
whenever æ does not equal 1, and so should be considered a 
different function to f (x). Rather, f (x) is a power series 
representation of 1/(1 — æ), locally defined just on the interval 
-1<2@<1. 


A general power series defines a function f(x) for those æ where the 
power series converges. This will occur for æ in an interval 
—R<ax<R, for some R in the range O< R< œ. R is called the 
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radius of convergence, and we set R = co when the series converges 
for all values of x. The power series does not converge when x > R 
or æ < —R, and may or may not converge when x = R or x = —R. In 
the given example, R = 1. Importantly, a power series defines a 
differentiable function f(x) where it converges and the derivative 
f'(x) can be found by differentiating the power series term-by-term. 
Recalling that the derivative of a” equals nx”~!, we obtain 


Sf (@) = & + 2eg@ + 3638? + 4e48? + ++, 


and this can be repeatedly applied to find power series for the 
higher derivatives of f (x). Conversely, we might ask: what 
functions can be represented by a power series? Certainly, such a 
function must be repeatedly differentiable. We might begin with a 
function f(x) and seek to find a power series representing it so that 


Our problem is to determine co, ¢1, C2, ... We can quickly find co by 
setting x = 0 in the above so that co = f (0). And if we repeatedly 
differentiate the above and set x = 0, we find 


f'(æ@) = C+ 22% +3038? + 4e? + ++ giving c =f"(0), 
f"(æ) = 2c + 6eza + 12c4%° + 20c5a7 +- giving 2c. =f"(0), 
m(x) = 6c3 + 24c4% + 60c5x? + 120cgx? +- giving 6c3 =f" (0). 
giving 


So 
f"(0) f"(0) 
— fi = = 
& = f'(0), C2 = 7x2’ “3 =ix2x3’ 
and, generally, we find that 
(n) 
E 
n! 


where f™ (x) denotes the nth derivative of f (x), the function 
arrived at when f (x) is differentiated n times, and where 


m=1x2x3x---xn, 


which is read as ‘n factorial’. So, we might expect that 


d n 1n (n) 
Fla) =f(0) +n AOT FO ERAO T 


This is called the Taylor series of f(x) centred at O (also known as 
the Maclaurin series of f(x)), and the same reasoning can be 
applied to arrive at the general Taylor series centred at a real 


number a: 
Fa) = f(a) +O @— a) + FO a-a (wa? 
(n) 
frais ee 18 a)” Pe 


These series are named after Brook Taylor, who first published on 
them in 1715, and Colin Maclaurin, though Newton had been 
aware of the general form of Taylor series as early as 1691, and 
James Gregory possibly even earlier. 


Recall we noted earlier that the derivative of e” is e. In fact, the 
exponential function f (x) = e” can be uniquely characterized by 
the properties 


Sf (x) =f (x) and f(0) =1. 


This means that f™ (x) = e and f™ (0) = 1 for all n; hence the 
Taylor series of the exponential function is 
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This series converges for all real x. It also gives us a new definition 
for e = e!, namely 


1 1 1 
e=1+1l+ototertote 
a!" 3l 


which converges to e much faster than the previous limit of Harriot 
and Bernoulli. 


Unfortunately, the problem of which real functions can be 
represented by power series is not particularly simple. 

Given a repeatedly differentiable function f(x), it is possible 
to write down its Taylor series centred at O as above. That 
Taylor series necessarily agrees with f(x) at x = 0, but there 
are examples where the Taylor series and the function agree 
only at x = O. A function is said to be analytic at a point if 
it is repeatedly differentiable and agrees with its Taylor series 
on an interval around that point. We will see (Chapter 7) that 
this situation resolves much more simply with complex 
functions. 


Radians and trigonometry 


Figure 12(a) shows a triangle ABC with a right angle at C. 
Denoting the angle ZBAC as x, we can define two functions sine 
and cosine of x, denoted by sin(x) and cos(x), as 


: length of BC length of AC 
sin(x) = —-——___,, cos(x) = —~_____. 
length of AB length of AB 
These are functions of the angle x, rather than the triangle 
ABC: if we scale the triangle to AB’C’, the three sides 
scale by the same factor and so the above fractions stay the 
same. The functions sine and cosine are referred to as 
trigonometric ratios or trigonometric functions. Pythagoras’ 
theorem states that 
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(a) B' (b) 


B 
B 
2 60° 
i 
æ 30° 

A C Cc A 3 c 

(©) B 

45° 
V2 
1 
45° 
A 1 C 
d 
(a) y 


y = sin(x) 


12(a). A right-angled triangle. 12(b). x = 30° and x = 60°. 12(¢). x = 45°. 
12(d). The circular functions’ graphs. 
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(length of AB)” = (length of AC)? + (length of BC)’, 


and so the following identity holds: 


2 ; 2 length of AC\ ? _ (length of BC $ 
(cos(x)) (sin(2)) E of $) l (ar of 35) 


This geometric argument works for angles x up to a right angle. 


Sine and cosine are also called the circular functions, as a general 
point P on the circle centred on the origin O and with radius 1 
has co-ordinates (cos(x), sin(a)) for some x (Figure 12(d)). 
Further, as the point P traces out the whole circle, we can draw 
out the graphs of sin(x) and cos(x). It is then apparent that sine 
and cosine have a period of 360°. That is, 


sin(x + 360) = sin(x), cos(x + 360) = cos(z), 


because after P has moved on 360’, a whole revolution, it has 
returned to the same point of the circle. 


When angles are first introduced, they are typically measured in 
degrees, which has the symbol °. There are 360° in a whole 
angle, and 90° in a right angle. From the definition, and from 
considering the drawn triangles (Figures 12(b), 12(c)), we can 
see that 


sin(0°) = cos(90°) = 0, sin(30°) = cos(60°) = = 
we 
V2’ 


sin(60°) = cos(30°) = ce sin(90°) = cos(0°) = 1. 


sin(45°) = cos(45") = 


However, mathematicians, with good reason, use radians to 
measure angles; the benefits of using radians are particularly clear 
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in calculus. There are 27 radians in a whole angle, rather than 


360 
Qn? 


values for sine and cosine, when using radians, are given below: 


sin(0) = 0, sin(7) sin (7) a sin(7) v3 sin(7) 1, 


2 
won (E, as(f)= 4, G C) 0 


In geometry, some formulae are improved with radians—for 


360°, so that 1 radian equals roughly 57 . The corresponding 


example, the length of a circular arc, radius r and angle x, equals ræ 
rather than the messier formula seen when using degrees, 75g. In 


calculus, though, the use of radians is crucial. 


Firstly, the derivative of sin(x) is cos(x) and the derivative of cos(x) 
is —sin(x), but these facts are true only if we are using radians to 

measure angles. Secondly, from these derivatives we can determine 
the Taylor series for sine and cosine. The successive derivatives of 
sin(x) are 


sin(x), cos(æx), —sin(x), —cos(#), sin(æ),..., 


repeating every four. Setting x = 0, we obtain the sequence 
0,1, 0, —1, 0,1, 0, —1, so the Taylor series for sine is given by 


i ji 1 fi o 2 -1 3 o Ao 1 5 
sin(x) = 04 Itat tae tae +e 4 
we a at 
x } 
3! 5! 7! 


O ee E E 

cos(x) 1+ rape Eae eg ea 
x? at xê 
1 
2! 4! 6! 
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These series converge for all real x and again are correct, provided 
radians are used. They were first determined in Europe by 
Newton and appear in his De Analysi, but had been known to 
Madhava centuries earlier. Analytically, it is beneficial to take 
them as our definitions for sine and cosine. Term-by-term 
differentiation can be validly applied to these series and, further, 
the rules of differentiation can be used to prove identities such as 
(sin(x))” + (cos(x))? = 1 for all real x. (See Appendix. Also 
appearing in the Appendix is a derivation of Madhava’s series for = 
using Taylor series.) 


Euler 


Leonhard Euler (Figure 13(a)), pronounced ‘oil-er’, was a prolific 
Swiss mathematician and a titan of 18th-century mathematics, 
with over 800 papers bearing his name. He made major 
contributions across mathematics—number theory, fluid 
dynamics, calculus of variations (Chapter 5), complex functions 
(Chapter 7)—but especially in the study of infinite sums. He is 
particularly remembered for evaluating the infinite sum S where 


the so-called Basel problem. He also produced some of the first 
topological results, showing that it’s impossible to traverse the 
seven bridges in Königsberg without repetition (Figure 13(b)), and 
he showed that for a (convex) polyhedron V — E + F = 2, where 
VE, F respectively denote the number of vertices (corners), edges, 
and faces of the polyhedron. Both these results depend on shape 
(e.g. how the points are connected) rather than geometry (e.g. the 
lengths of the edges). Much modern mathematical notation is 
due to him; he introduced the notation e for Euler’s number, i for 
the square root of —1 (Chapter 7), and f for a function, and 
popularized the notation for z. 
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13(a). Leonhard Euler. 13(b). K6nigsberg’s bridges as a graph. 
13(c). Showing S converges. 


We can use calculus to show that the sum S is finite. Note that the 
rectangles under the graph of y = 1/a” (Figure 13(c)) have total 
area S which is then less than 


14| r, 


1%? 
as this equals the area of the first rectangle and the area under the 


graph for x > 1. As —1/z is an antiderivative of 1/x?, the 
fundamental theorem shows 


+f a -1 =i r 1 
"J, a (X 1 xX’ 


which approaches 2 as X becomes large. It follows that S is finite 
and less than 2. 


By modern standards, Euler’s first solution of the Basel problem 
was inventive but cavalier; details are in the Appendix. This 
highlights how, despite all the progress and creativity of the 18th 
century, mathematical rigour had not progressed far. And whilst 
Euler, in his seminal Introductio in Analysin Infinitorum of 1748, 
made central for the first time the notion of function, his 
definition—that a function is described by a single analytic 
expression—would ultimately prove too restrictive and even 
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caused difficulties at the time for the physical systems 
mathematicians were modelling (as we will note with the wave 
equation in Chapter 6). 


Differential equations 


The greatest application of calculus is a consequence of how well 
the world around us can be modelled by differential equations. 
Differential equations describe theories of gravity (Newtonian and 
Einsteinian), electromagnetism (Maxwell’s equations, wave 
equation, Laplace’s and Poisson’s equations), classical mechanics 
(Lagrange’s and Hamilton’s equations), quantum theory 
(Schrédinger’s equation), fluid dynamics (Navier-Stokes 
equations), economics and finance (Black-Scholes equation), 
thermodynamics (heat equation), and mathematical biology 
(predator-prey interactions and epidemiology), etc. 


A differential equation is an equation involving a function and 
its derivatives. Such equations arise quite naturally—for example, 
applying Newton’s second law of motion to any system states 
something about acceleration, a second derivative. Suppose a 
particle moves vertically under gravity, having height h(t) over 
the ground at time ¢. Ignoring air resistance and assuming 
gravity (denoted as g) is constant, A(t) satisfies the differential 
equation 


h(t) = ~g. 


This differential equation has order 2, or is second order, as the 
highest derivative involved is the second derivative. 


This equation states that the particle is accelerating due to gravity, 


and the minus sign denotes that gravity acts downwards. We can 
find the general solution for A(t) by integrating twice. An 
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antiderivative of —g is —gt, so the general antiderivative is —gt + c, 
where c is a constant, giving 


h'(t) = -gt +a. 


And an antiderivative of —gt + c is —}.gt? + ct, so that 


1 
h(t) = -38t +at+ c, 


where cs is a second constant. This does not specify the particle’s 
trajectory without further information; rather the above 
expression describes all possible flights. But, say, knowing the 
initial height h(0) = cə and initial velocity h’(0) = cı determines 
h(t) exactly. Such a description of a system—a differential 
equation and initial conditions—is called an initial value 
problem. 


Exponential growth can also be characterized by a differential 
equation. Physically, this might represent the growth in the 
number N (t) of bacteria with time ¢ from a single bacterium, while 
there is sufficient food or energy resource. If r, a positive constant, 
is the growth rate of the bacteria, then N (t) satisfies the initial 
value problem 


Note N (t) can be a general positive number, whilst the number of 
bacteria is a whole number, but the above provides a reasonable 
approximation of the reality of the system. 


Differential equations can often be solved using power series. 
We might try a solution of the form 
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and substitute this into the above differential equation. Using 
the initial condition, we can see that co = N(0) = 1. And 
differentiating term-by-term, the differential equation now 
reads as 


Ci + 2cot + 3c3t? + 4e4t? + = r + rett ret? + rest? +. 


Comparing the coefficients of like powers on the left- and right- 
hand sides gives 


and so on. From this we can see that the solution is 


re Pe rétt l 
2! 3! 4! 


This method is valid, when it works, in that it will yield a solution, 
but not all differentiable functions can be represented by power 
series. 


The solution N (t) = e” cannot be realistic for all times t, as the 
function grows without bound. Eventually resources will become 
limited, so we might introduce a population capacity K to make the 
model more realistic as Pierre Verhulst did in 1838 with the 
differential equation 


His equation has a drag factor of 1 — N/K on the growth rate. 
When N is small compared with K, the growth rate is still 
approximately r, and N grows almost exponentially, but as N 
approaches the capacity K, the growth rate becomes close to zero. 
The S-like solution to Verhulst’s equation appears in Figure 14. 
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14. S-like growth for Verhulst’s model. 


In these examples a single variable, A or N, determines the status of 
a system, but some systems involve more than one variable. For 
example, a system involving competing populations of foxes F(t) 
and rabbits R(t) needs both population sizes to be specified. 
Equations modelling such systems were studied by the American 
biologist Alfred Lotka and Italian mathematician Vito Volterra, 
who independently arrived at the differential equations 


F'(t) = —mF + aFR, R'(t) = bR — kFR, 


where a, b, k, m are positive constants. The two equations model 
the system by assuming: 


e the rabbits breed at a certain rate b; 


e the number of rabbits being killed, —ÆFR, is proportional to both 
the rabbit population (the more rabbits there are, the more get 


caught) and the number of foxes (the more predators, the more 
rabbits killed); 


e the foxes rely on the rabbits as a food source to multiply, so, for the 
reasons just given, their growth term is an FR term countered by a 
term, —mF, proportional to F, due to death from disease and old 
age. 


These simultaneous differential equations lead to periodic 
solutions F(t), R(t), which are plotted in Figure 15(a). Some of the 
different possible egg-shaped paths that the point (F(t), R(t)) 
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(a) (b) 
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15(a). Fox and rabbit populations. 15(b). The cycles (F, R) travels. 


might travel are also plotted here (Figure 15(b)). If, at a certain 
time, we had Fo foxes and Ro rabbits, the point (Fo, Ro) would lie 
on one such egg-shaped path; as time progresses the point 

(F(t), R(t)) would move around that path, eventually returning to 
the same point (Fo, Ro) and repeating forever. 


Emile Picard showed, given a set of reasonable though technical 
criteria, that there exists a unique solution to the initial value 
problem 


W fiey), yl) = yo, 


which is locally defined. Picard’s proof is constructive, defining a 
sequence of functions y,(x) which converge to the solution. The 
sequence is defined iteratively by 


X 


yola) = yo, pla) = yo+ | mat) a. 


As an example, the exponential function satisfies this initial value 
problem when f(z,y) =y, 2% =0, Yo =1. In this case, 
Picard’s theorem generates the functions 
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a 

yala) 14| a+) a 1+a#+—, 

‘ 2 

X t x? 

ys(æ) l4 I ( + t 4 5) dt 1+ 24 2 


We can see this sequence equals the partial sums of the exponential 
function’s Taylor series. 


x 


6 


It may seem hard to imagine a situation where a solution is not 
defined, at least locally. An initial value problem provides a point 
on the solution (xo, yo) and a ‘direction of travel’ by specifying the 
gradient there. It could become the case that following that 
direction of travel might lead to y(x) or y'(x) becoming infinite 
(Figures 16(a), 16(b)). So it’s reasonable that the solution may only 
be locally defined. 


But if Picard’s criteria are not met, then there can be more than one 


way to proceed along the direction of travel (Figure 16(c)). The 
general solution to the initial value problem 


dy 
ane 2y, y(0)=0 
is the infinite family of functions 


(0) if x<a, 
yla) = \ (x-a)? ifæ>a, 


where 0 <a < œ. Note that when y = 0, the gradient dy/dx is 


also zero. The issue is that while y remains zero, there are two 
possible ways to follow the given direction of travel: continue along 
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unique 
local 
progress 


tate 
IBAA AAD A AAA 
AAA I AA AAA AS A 
AAAA AAA AK AA AD 

aA 


non-unique| 
local 
progress 


(0) = 0 y'=2fy, y(0)=0 


y= (y+, y(0)=0 y'= 


a8 

Gay 
16(a). y(x) becomes infinite. 16(b). y (x) becomes infinite. 
16(c). Non-unique progress. 


the x-axis or start on a half-parabola. But once the solution moves 
on to a parabola, the solution must continue along it. 


Bolzano and Weierstrass 


Finally, in the mid 19th century, a definition of limit would be 
found which made no reference to infinitesimals. This definition 
is usually attributed to the German mathematician Karl 
Weierstrass, though such definitions had been implicit in Cauchy’s 
Cours d’Analyse of 1821 and explicitly in the work of 1810-17 of 
Bernard Bolzano, which went unnoticed in his lifetime. 


We already met in Chapter 1 the definition of a sequence 
@1,X2,23,... having a limit L. This meant that, however close we 
wished the sequence to get to L, this would eventually happen and 
continue so. More formally, this means that given any positive £ 
(this is our notion of ‘close’, so we typically consider ¢ as small), 
there exists a positive integer N (this is a point from which 
‘eventually’ starts happening) for which 2, is suitably close to L 
when n>N. 


We similarly wish to define what it means for a real function f (æ) 
to have a limit L as æ gets near to a. Note, generally, that f(a) may 
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17(a). A continuous function. 17(b). A discontinuous function. 


not be defined and, even if it is, f (a) may be different to L. If 

L = f (a), then f(z) is said to be continuous at a (Figure 17(a)); 
likewise, the function may have no limit at a and distinct limits 
from the left and right (Figure 17(b)). The study of continuous real 
functions, and their generalization to metric spaces and topological 
spaces, is an important part of analysis, but one which is covered in 
detail in Topology: A Very Short Introduction. 


We want to guarantee f (x) is sufficiently close to L if x is sufficiently 
close to a. So we require, for any positive £ (the demanded closeness 
in the outputs), that there exists a positive ô (a closeness in the 
inputs to meet the demand) such that f (x) is within ¢ of L whenever 
x is within 6 of a. It’s important to note that: 


e the definition makes no reference to infinitesimals; 


* we require the output f (x) to be constrained in a certain way if æ is 


appropriately constrained; 


e weneed to be able to do this for all constraints e: for each choice of € 
we will need a choice of 5 that meets the requirement; 


e for a smaller g, 6 will usually need to be smaller as well; 


* given g, any ô that meets the requirement is fine—we're not looking 
for a largest ô, say; 


e the ‘faster’ the function f(z) is changing at a, the smaller 6 will need 


to be, relative to £. 
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So, we can finally and rigorously define what it means for the 
function f (x) to be differentiable at a. This means that the 
approximating gradient 


Slat h) ~f(@) 
h 


has a limit as A approaches 0, and we denote this limit as f'(a). 


This terminology is often referred to as -ô analysis (read ‘epsilon- 
delta’). This is the standard notation of analysis, and is a final shift 
from the informality or imprecision of previous definitions. It is the 
hallmark of modern analysis, which by Weierstrass’ time had 
carved out for itself a canon of its own within mathematics. 
Intuitively, you may think of a continuous function as one with a 
graph which can be drawn without taking your pen off the paper, 
and that a function is differentiable if the graph does not change 
jerkily, but both these notions are woefully insufficient for proving 
results about continuous and differentiable functions. 


Riemann’s integral 


During the 19th century, integration theory also advanced 
considerably. In his Résumé of 1823, Cauchy defined integrals of 
continuous functions on intervals a < æ < b. His work is important 
in various ways: he considered integration in the context of signed 
areas, rather than as simply antidifferentiation—the inverse of 
differentiation—and he (largely) proved the fundamental theorem 
(largely as there was an important technical gap in his argument). 
However, Cauchy’s treatment, of functions that were either 
continuous or had only finitely many discontinuities, wasn’t 
sufficiently broad for the evolving notion of a function. 


In 1854 Riemann developed a more general theory of integration, 
treating bounded functions on bounded intervals, though what 
follows below is an equivalent treatment of Riemann’s integral by 
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18(a). A step function. 18(b). A step function below f(z). 
18(c). A step function above f(x). 


the French mathematician Gaston Darboux from 1875. Note that, 
compared with 17th- and 18th-century notions of an integral, this 
definition makes no reference to infinitesimals. 


As an uncontroversial starting point, we define the area of a 
rectangle as equal to its base multiplied by its height. A step 
function ¢(z), on an interval a < x < b, is a function whose graph 
comprises a finite collection of rectangles, above or below the 
a-axis (Figure 18(a)). The integral of a step function is just the sum 
of these rectangles’ signed areas. 


Our aim is to assign a real number J to the integral of a bounded 
function f (æ). If the graph of a step function g(æ) lies entirely 
below the graph of f (x), then we would expect J to be at least the 
integral of g(x) (Figure 18(b)). Likewise, if the graph of a step 
function (x) lies above the graph of f(x), then we would expect 
I to be at most the integral of p(x) (Figure 18(c)). For most naturally 
occurring functions these requirements specify a unique value for J, 
but we will see in Chapter 8 that this is not generally the case. 


More explicitly, given a bounded function f (x) on a < a < b, the lower 
Riemann integral of f (x) is the smallest real number Jiower such that 


b 
Tower >| p(x) dx, 
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where g(x) is a step function with g(x) < f(x) for each x. And the 
upper Riemann integral of f (x) is the largest real number Tupper 
such that 


where y(x) is a step function with y(x) >f (x) for each a. Finally, 
we say that f(x) is Riemann integrable if ower = Tupper, and call 
this common value the Riemann integral of f(z). 


Whilst Riemann’s integral was more general than Cauchy’s, it still 
did not assign integrals to unbounded functions and/or functions 
on unbounded intervals. For example, the integral 


E 
=1, 


x2 
1g 


which we met earlier, can only be considered by calculating the 
integral between 1 and X and letting X tend to infinity. The only 
reasonable answer for the area represented by this integral is 1, but 
it cannot be evaluated within Riemann’s theory and the above is 
referred to as an improper Riemann integral. The limitations of 
the Riemann integral are discussed further in Chapter 8. 


The story of calculus puts paid to any notion that mathematical 
concepts are conceived complete and polished, or the idea that we 
are not doing mathematics until the final ‘’ is dotted. Indeed, in 
much of what follows, I will continue referring to infinitesimals as 
I introduce new types of derivatives and integrals—such language 
is often the most convenient for giving an informal sense of a 
concept—but the importance for mathematics of being able to 
apply calculus rigorously, without reference to ambiguous notions, 
and to develop general theories of modern analysis, cannot be 
overstated. 
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Chapter 4 
Should | believe my 
computer? 


Scientists, engineers, and social scientists use mathematics in 
much of their work, but you may not have considered how the 
theories of an ideal mathematical world are brought to bear on the 
real one. A sense of reality can only be achieved via 
experimentation, but how should we move from a collection of 
experimental data to wholly defined functions? 


Further, the problems met in school classrooms commonly leave a 
false impression. Such problems usually have exact answers, but 
with real-world problems, there is typically no hope of finding the 
exact answer. 


By way of a first example, consider the following equation: 
a7 =a. 


It’s not hard to spot that x = 4 is a solution, as 24? = 2° = 4. But 
if we sketch the two graphs y = w and y = 2”~? (Figure 19), then 
we see that there is a second intersection around where x = 0.3. 
How might we determine the second solution? 


We should not be too ambitious—there are deep theorems of 
mathematics which prove that we cannot expect to solve general 
problems in terms of the so-called ‘elementary functions’, such as 
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19. Graphs of y=2"~? and y=z. 


Table 2. Comparing values of 2”? and x 


x 2%? (to 4 d.p.) Comparison 
0.2 0.2872 raog 
0.3 0.3078 Jgn 
0.4 0.3299 gerea 
0.31 0.3099 QW <a@ 


polynomials, exponentials, logarithms, and trigonometric 
functions—but perhaps an approximate answer, to some desired 
accuracy, will suffice. 


In Table 2 we compare 2” ~* and x for some values near x = 0.3. 


We see that 2”~? is greater than a at x = 0.2 and æ = 0.3, but less 
than z at x = 0.4. The two graphs have crossed somewhere 
between x = 0.3 and x = 0.4, meaning the other solution is 
between 0.3 and 0.4. Trying further values, we see at x = 0.31 that 
2”? is less than x, and we now know that the second solution lies 
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0.35 
(X1, £3) 


0.30 (£2, £3) 


0.30 0.35 0.40 


20. Cobwebbing to find the second solution. 


between 0.3 and 0.31. We are getting closer, though there is still 
much work to be done if we want, say, to find the second solution to 
six decimal places. Can we be more systematic? 


¿ayndwo Aw anaijaq | PINOYS 


Another approach is termed cobwebbing (Figure 20). The idea is to 
start with a nearby estimate of a solution and, ideally, produce 

a sequence of increasingly accurate approximations. The approach 
can be applied to solve equations of the form x = f (æ), the so-called 
fixed points of the function f (x). In our example, f(a) = 2”~?. 


We've taken 2, = 0.4 as our initial estimate, as plotted on the 
a-axis (Figure 20). We then draw a line vertically from (æ, 0) up to 
the graph y = f (x) and denote this second point (21,22) so that 
æ = f (#1). Moving to the left, we get a third point (a2, #2) when we 
reach the y = 2 line. (Note that x» is closer to the solution than our 
original estimate xı.) By repeating this process, generating points 
(£2, 83), (@3,23), (X3, %4), - -that lie on the ‘descending staircase’, 
we produce a sequence of estimates 2), X2, 83, ... getting ever 
closer to the solution. In each case, 241 =f (ap), and the values of 
these estimates are given in Table 3. 
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Table 3. Fixed-point iterations using f(x) = 27? 


xy 0.4 Xs 0.3101066 £g 0.3099074 
Xo 0.3298770 x6 0.3099498 X10 0.3099070 
Xs 0.3142265 ar 0.3099161 ay 0.3099070 
x4 0.3108362 Xs 0.3099089 Xo 0.3099070 


(Wa, Ma), 
y = cos(x) rr 


(Tg X3) (ag, L3) 


(x, 0) 
0.2 0.4 0.6 0.8 1.0 


21(a). Cobwebbing near the x = 4 solution. 21(b). Cobwebbing when 
-1 < f'(æ)<0. 


This fixed-point iteration appears to have provided us with the 
solution to six decimal places, namely 0.309907, by the ninth 
estimate a9; this is certainly fewer steps than our initial approach 
would have taken. We can verify that this is indeed the solution to 
six decimal places by comparing 2*~? and x at 2 = 0.3099065 
and at x = 0.3099075 and showing that 2”~? — x changes 

sign between these values. This is because the «x in the range 
0.3099065 <x < 0.3099075 are precisely those æ which round to 
six decimal places to 0.309907. 


The cobwebbing approach converges quickly to the solution 
(Figure 20), as the gradient f'(x) at the solution is small 

(the curve is close to horizontal). If we had not immediately spotted 
that x = 4 is a solution, we might have sought to find that solution 
using cobwebbing. However, we can see (Figure 21(a)) that the 
estimates 21, V2, X3, ... move away from the solution. The problem 
with this solution is that the gradient f’ (4) is greater than 1. A fixed 
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point w is attracting if —1 < f'(x) <1. When -1</"(x) <0, the 
estimates 21,2, %3, ... converge to the solution (Figure 21(b)), 
alternately as over- and underestimates, and the figure looks more 
like the eponymous cobweb. 


This cobwebbing approach highlights some of the important, 
general characteristics of numerical analysis: 


e We produce approximations to the exact solution, which can be 


made as accurate as we wish. 


e We have means of checking our answer is correct to the required 


accuracy. 


e Ideally these approximations converge quickly to the solution. 


Interpolation and extrapolation 


Even the previous problem is idealized, compared with real-world 
problems: we might not have been able to find the exact solution, 
but we could at least describe the problem fully. More realistically, 
the ‘functions’ associated with a real-world problem won’t be so 
fully specified. In practice, we will just have some experimental 
data: say a number of experiments are conducted, and the ith 
experiment outputs y; when we run the experiment with input 2;. 
How can we estimate other outputs y when we haven’t run the 
experiment with input x? 


Suppose we have a data set from six experiments, as in Table 4. 


These six data points (x;, yi) are plotted on the graph in 
Figure 22. There is no y-value for x = 1.5, but surely we should be 


Table 4. Experimental data points 


a 1 2 3 4 5 6 


Yi 0.3010 0.5490 0.8386 1.2348 1.3632 1.7464 
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22. Plot of the data points. 


able to estimate y(1.5). Our problem, if anything, is that there are 
many ways to make such an estimate and some academic 
judgement is needed to decide on the most appropriate method. 
The process of estimating y-values that correspond to -values 

in the given range 1<2<6 is known as interpolation, and 

the estimation process outside the given range is called 
extrapolation. 


There are various approaches to interpolation, so it would be useful 
to have some priorities for our estimate y(x) to help choose 
between the different methods. It would be convenient if: 


¢ the formula for y(x) is relatively simple and uncomplicated to 
evaluate; 


¢ the function agrees (or almost agrees) with the data points (æ;, yi); 
e the function does not fluctuate wildly between the data points; 


e the function y(æ) is differentiable at least once, perhaps more 
often; 


e the function has a form which is plausible, given the nature of the 
experiment. 
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-1}/ 1 2 3 4 5 6 


-2 


23(a). Polynomial interpolation. 23(b). Interpolation with splines. 


Polynomial (or Lagrangian) interpolation: Given n data points, 
there is a unique polynomial of degree less than n whose graph 
passes through the n points. For the given example, this is the 
polynomial of degree five plotted (Figure 23(a)). Two weaknesses 
with this approach are that a degree five polynomial is relatively 
complicated to calculate with, and the polynomial extrapolates 
poorly beyond the range 1<æ <6. 


Splines: One way around the problem of using a single high-degree 
polynomial is to use a number of different polynomials of low 
degree to interpolate the data (Figure 23(b)). For example, a cubic 
spline is a function y(x) such that: 


e y(z) is defined by some cubic (degree 3) polynomial p;(x) between 
a; and Xir; 
e  y(a;) = y; at each data point; 


e y(z) has a continuous second derivative. 


Least-squares approximation: The data points plotted (Figure 22) 
appear to lie, approximately, on a straight line. No single line 
passes through all six points, but this is to be expected, as the data 
were generated experimentally and there will always be some 
experimental error. But is there a ‘best-fit’ line in some sense? The 
equation of a line is y = ax + b, so for any such line the model’s 
estimate of ax; + b differs a little from the experimentally recorded 
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24. Linear approximation. 


answer of y;. The least-squares approximation is the line 
y = ax + b, for which the total error 


E(a,b) 


(axı +b- y1)? + (aa + b= yo)” + +++ + (axs +b — ys)” 


is minimized. The error E(a, b) is a function of a and b and can 
be minimized using multivariable calculus (Chapter 5). For the 
given data the best-fit line (Figure 24) is when 


a = 0.2876 and b = —0.0011. 


The actual data in Table 4 was produced using y; = (ax; + b) + £, 
where 


a = 0.3126 and b = —0.1416 


and where e represents a random error from the range 

—0.15 <£ < 0.15. These estimates for a and b are not 

particularly accurate here, but there are clear reasons for 

this: practically, we would hope for more than six data points 
to work with; moreover, an experimental error of up to 0.15 is 
substantial for outputs y; in the range 0 to 2. For better estimates 
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we would need to collect more data and/or improve our 
experiment to reduce the errors. 


Least-squares approximation can be used for families of curves 
other than straight lines. It may be that the model for an 
experiment implies that a solution should have the form 


a sin(x) + b cos(x) 


for some a, b yet to be determined. A different error formula E (a, b) 
could be created as before and minimized, but now using terms 
such as (a sin(a;) + b cos(a;) — vi) 


Numerical differentiation and integration 


In 1768 Euler presented a method for finding approximate 
solutions to the initial value problem 


HF e,y), yao) = yo: 


é4ayndwios Aw anaijaq | PINOYS 


If we know a point (x, y) on the solution’s graph, then the 
above differential equation also tells us the gradient of the 
solution at that point, namely f(z, y). The premise of 
Euler’s method is that for a suitably small increment A in x, 
the solution’s graph is approximated well by a line segment 
with one endpoint at (x, y) and gradient f (x, y) (Figure 25). 
If the increment in z is A, then the other endpoint is 

at (vwt+th,ythf(a,y)). 


We know for certain that the solution’s graph has a point (ao, yo) 
on it. Working from that point, we can then generate the next 
estimate (%1, y1) where 


X =X +h, Yı = Yo +f (xo, Yo)h, 
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25. Euler’s method—the first iteration. 


0.0 0.2 0.4 0.6 0.8 1.0 


26. Euler’s method—successive iterations. 


and then (Figure 26) generate subsequent estimates y, (for n > 1) 
by using 


Ln = Xo + nh, Yn = Yn-1 +f (@n-1, Yn-1)A. 


Recall that the exponential function y = e” is the solution to the 
initial value problem y'= y and y(0) = 1, so that f (x,y) = y. 
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If we were to apply Euler’s method to this problem, then we would 
have x, = nh, generating the estimates 


Yul; n=l+h; y= lth+(1+hh=(1+h)’, 


and, generally, we would find that 


If we fix x = nh but use decreasing increments A, or equivalently 
increasing n, then the estimate of y(x) converges to the solution e”, 
as this is Harriot’s original definition of the exponential. 


Note that the formula from Euler’s method can be rearranged as 


ea ie =f (@n-1, Yn-1); 


which is a natural approximation of the differential equation 

y'= f (x,y). In fact, such an approximation for y’ can be used more 
generally for other differential equations. Similar approximations 
exist for higher-order derivatives—for example, 


Yn — 2Yn—-1 + Yn-2 
2h? 


provides an approximation for the second derivative y”. 


Since Euler’s time, his method has been variously improved upon, 
and would not be widely used today, but it gives a sense of how one 
might seek approximate solutions to a differential equation when 
no exact solution is possible. One way in which the method is 
simplistic is that the same increment A is used throughout. 
However, the function f (x,y) may change rapidly for some values 
of x, compared with others. It would make sense to use much 
smaller increments when this is the case. 
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For other problems, it may be imperative that y is a periodic function— 
for example, if it describes the distance to a planet which traces the 
same orbit each year. Whatever approximating method is used, the 
approximate solution must be periodic for it to make sense physically. 


There are also various means for estimating definite integrals. The 
function y = sin(x?) for x in the interval O <x < 7/2 is graphed 
(Figures 27, 28). The value of the desired integral is 


a/2 
| sin(a”)da = 0.828116... 
0 


First the trapezium rule is employed (Figure 27), dividing the 
original interval into four. In general, the trapezium rule splits the 
given interval into equally wide subintervals. An approximation to 
the graph of f(x) is created by connecting the points (a;, f(2;)) 
with lines. The approximating graph overlies trapezia, whose total 
(signed) area gives an approximation of the integral. 


Simpson’s rule (Figure 28) is a little more refined. An even 
number of subintervals needs to be used—here four subintervals 


/ \ 
0.8 ye x 


0.6 


0.4 


27. Trapezium rule. 
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28. Simpson’s rule. 


are used, but they are grouped as £o, a1, #2, and #0, #3, #4. 
Simpson’s rule approximates f (x) between zo and x» using the 
unique quadratic that passes through the first three points 

(£o, f (@o)), (a1, f(a@1)), (%2, f (£2)); a different quadratic is then 
used for the interval between 2, and 2,, and so on. Simpson’s rule 
gives an estimate for the integral of f (x) by integrating these 
quadratics instead. 


These two rules, using four subintervals, give estimates of 0.79621 
and 0.82845 to five decimal places. In general, Simpson’s rule 
produces better estimates. The rule is named after Thomas 
Simpson, and appears in his calculus text of 1743. But this was a 
‘rediscovery’; it had been known to mathematicians as early as the 
17th century. 


In all this, note that there are pros and cons to using smaller 
increments. Yes, the accuracy of our estimate will improve, but the 
computational time and memory needed will also increase. 
Depending on the specifics of a problem or experiment, an 
estimate will only be required to a certain accuracy, and it would be 
largely pointless to put resources into obtaining further accuracy. 
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Numerical stability and error analysis 


Numerical stability relates to the concern that some algebraic 
operations may exacerbate errors or approximations of values. 

A value used in a computation may differ from a notional ‘true’ value 
either because the value was measured experimentally or because 
the value was the true value rounded to a certain number of decimal 
places. Such stability concerns are particularly important in a long 
computation involving thousands of iterations: the issue of 
rounding errors may lead to nonsensical answers being generated. 


If x denotes a true value and 2; approximates 2, then the difference 
in outputs f (x) and f(a) is approximately 


f'(@1) (a — a), 


provided f (x) is differentiable. So, approximate errors won't be 
exacerbated if the derivative f'(x) is small. This concurs with the 
behaviour we saw earlier with cobwebbing—recall that a fixed 
point a of f(x) = x is attracting if the derivative f'(a) lies between 
—1 and 1. The error improves by a factor of approximately f'(a) 
with each iteration. 


For some iterations we can expect even faster convergence. 
Newton’s method is a means of approximating solutions of an 
equation f (x) = 0. The idea for the iteration is captured in 
Figure 29. If we have a first approximation 2, to the solution a, 
we might expect that 


f(a) 
f'(a) 


& = XU — 


is a better approximation. This value of x> occurs where the 
tangent line to y = f (x) at the point (a, f (m)) crosses the x-axis. 
(See the Appendix for details.) 
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29. Newton’s method. 


We can then replace 2 with x» to produce the next iteration %3, 
and so on. Under technical but broad conditions—essentially for a 
graph that looks like the one plotted (Figure 29)—the iterations 
x, will converge as a decreasing sequence to the solution a. 

Even better, this convergence is guadratic; this means that there 
is a positive constant M such that 


[£n — a| < Man — al’. 


The expression |æ; — a| is the error of the nth iteration 2, from the 
solution a. This will be a small number as the iteration progresses, 
so its square |, — a|? will be yet smaller. Such quadratic 
convergence will be much faster than the linear convergence 
achieved by cobwebbing. 


Maintaining some bounds on errors is an important part of 
numerical analysis, and it provides certainty that the 
approximations work to the required accuracy. For Euler’s method 
it can be shown that the error is proportional to the increment A, 
and so the estimates will converge to the correct solution as h 
decreases. Similarly, it can be shown that the error for the 
trapezium rule with n intervals is proportional to 1/n”, while the 
error for Simpson’s rule is proportional to 1/n*. This shows that 
Simpson’s rule is superior. 


79 


é4ayndwios Aw anaijaq | PINOYS 


Mathematical Analysis 


Linearization and stability 


The motion of a swinging pendulum can be modelled by the 
differential equation 


A pendulum consists of a light rod of length Z with a mass at its 
end; here g denotes acceleration due to gravity and 6(t) denotes 
the angle the pendulum makes with the downward vertical at time 
t (Figure 30(a)). 


If the pendulum starts from rest at an angle a, then this initial 


value problem can be largely solved to show that the pendulum 
swings with period 


T=4 -j 2 do 
2gJo \/cos(6) — cos(a) 


Unfortunately, this integral cannot be evaluated exactly. 


However, if the pendulum makes only small oscillations, then 


1.0 1.5 


30(a). Pendulum. 30(b). Period of a pendulum. 
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we can use the trigonometric series to approximate the 
pendulum’s motion. 


Recall the sine and cosine series: 


f a 0 or a oe 6 
sin(0) = 8 3I H Bl 7 He,  cos(0)=1 aI H 4! 6l pea 
If 8 is small, then 6, 6?, 6*, 6°, ... become ever smaller. So, for 


suitably small angles, 6 — 6?/6, or even just 0, will be a reasonable 
approximation for sin(@). If we use the approximation sin(0) ~ @ 
in the original differential equation, we get 0"(t) = (—g/1)@, and 
this is an initial value problem that we can solve, obtaining 


A(t) =a cos( Br). 


Note that the period of this function, 2r y//l/g , does not depend on 
the initial angle a, provided that a is small. This is a good first 

approximation to T, accurate to one per cent for a up to 20°, but 
we might use more terms from the sine or cosine series to produce 
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yet better approximations, such as the power series 


VE a? Tlat ) 
T=27 14 } eae ke 
g\ | 16 3072 


Now (Figure 30(b)) TVg/l as a function of æ is the top graph 
plotted alongside the approximations for TVg/l using the series 
approximation up to the a? term (bottom graph) and also up to 
the a* term (middle graph); this last approximation is almost 
indistinguishable from the true graph. 


Earlier, when we replaced sin(@) by 6, we were performing 
linearization. This involves treating all powers of degree higher 
than one as negligible. Linearization can help study the equilibria 
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of physical systems. An equilibrium is a state of a system where, in 
principle at least, the system might remain at rest. For a pendulum, 
such an equilibrium has it hanging downwards. The previous 
discussion showed this equilibrium to be stable and determined 
the period of small oscillations about the equilibrium. Another 
equilibrium for the pendulum is when it hangs upright—this is 
theoretically possible, albeit precarious. We can linearize the 
differential equation about the vertical and see that realistically the 
pendulum will move exponentially away from the upward 
vertical—that is, this equilibrium is unstable. 


Linearization is an important tool of analysis, especially for the 
study of equilibria, but it fails to capture the qualitative global 
nature of general systems. A system might exhibit chaos, meaning 
that the system can evolve in vastly different ways from two very 
close starting positions—the so-called butterfly effect. Chaos is seen 
in a system as simple as the double penduluwm—where a second 
pendulum is attached to the bottom of a pendulum. Consequently, 
chaos causes great challenges for numerical methods. 


The next chapter introduces the calculus of functions of more 
than one variable, to which the numerical and approximating 
techniques that have been described here can be extended. For 
example, in the previous chapter (Figure 15(b)), we saw that there 
are two equilibria for the Lotka-Volterra model: one is when 

F = R = 0, so that both populations are absent/extinct and 

this is an unstable equilibrium; the second equilibrium is when 
F = b/k and R = m/a; this is the point at the centre of the 
egg-shaped cycles. This equilibrium is stable: if the system is at a 
nearby point, linearization can be used to show that the point 
(F,R) cycles on a small egg-shaped curve around the equilibrium 
with a period of 27/vbm (see the Appendix for details). 
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Chapter 5 
Dimensions aplenty 


In Chapter 2 we introduced the calculus of real functions of one 
variable—that is, functions that take one numerical input and 
produce one numerical output. Most functions, though, are not 
of such a simple form. As you read this, in this instant, the 
temperature T around you might be described as a function of 
three spatial co-ordinates x,y,z needed to describe a point’s 
position; T(x, y,z) would then denote the current temperature at 
that point. This function T has three numerical inputs and one 
numerical output. It seems physically reasonable that T should 
be both continuous and differentiable, but perhaps it is not 
quite clear what it means for a function T(z, y,z) with three 
inputs to be differentiable—we certainly can’t just graph this 
function in the plane and define the derivative as the gradient 
of a tangent line. 


Despite this, you may have heard the phrase ‘temperature 
gradient’ used. Informally, this might relate to the change in 
temperature as you move in a particular direction. If the 
temperature gradient were negative, then you would be getting 
colder, and if positive, getting hotter. More precisely, the 
temperature gradient is a vector—its magnitude is the greatest 
rate of increase in temperature over all directions, and its 
direction is how you would need to move to appreciate that 
greatest rate of increase. 
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Scalars and vectors 


In mathematics and science, there is an important distinction 
between scalar quantities and vector quantities. Temperature is a 
scalar quantity, meaning that the output T(z, y,z) is a single real 
number. Many quantities are scalar, including distance, speed, 
angle, density, mass, volume, energy, charge, power, 

and probability. All of these can be represented by a single 

real number, which may be positive, zero, or negative in 
particular cases. 


However, many quantities cannot be represented in this 
way—examples include wind velocity, gravitational force, 
electromagnetic fields, and angular velocity. It may be the case that 
the wind is blowing at five metres per second—this is the speed of 
the wind or its velocity’s magnitude—but to fully describe the 
wind’s velocity, we would need to describe the wind’s direction as 
well: to what extent it is blowing forwards or backwards, left or 
right, up or down. Quantities that have a magnitude and a 
direction are called vectors. 


Once we have introduced three spatial co-ordinates x, y, z, a vector 
quantity can be represented by three components (vy, Vy, Vz). Such 
a vector might also be denoted by a single letter in bold such as v. 
If v were to represent wind velocity, then v, would be the extent to 
which the wind was blowing up (if v, is positive) or down (if v; is 
negative). And the magnitude of v is denoted as |v|, which is given 
by the formula 


Ivi = (oe)? + (0)? + (0. 


If v represents wind velocity, then |v| is the wind speed. If a vector 
r = (x,y,z) represents the position of a point from an origin 

(0, 0,0) (Figure 31(a)), then, by Pythagoras’ theorem, |r| is the 
length of the vector r, or equally the distance of the point (x,y,z) 
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31(a). The point (1,2,2). 31(b). The wind flow v(x, y) = (y, x). 


from the origin. A two-dimensional wind flow is shown in 
Figure 31(b). 


Finally, we need to be clear what we mean by direction. If a wind 
with velocity v blows in a certain direction, then a second wind 
with velocity 2v blows in the same direction but with twice the 
speed. Consequently, we will use unit vectors to describe 
directions—that is, vectors of magnitude (or length) one. 


Directional and partial derivatives 


Let T (x,y,z) be the temperature T at a point (x,y,z). Depending 
how we move from this point, we might get colder or warmer. Using 
vectors, we now have an appropriate language to describe such 
movements. If we move in a certain direction, represented by a unit 
vector u, then the directional derivative of T in the direction u is 
the rate of change of temperature T as we move in direction u. 


There are three particularly important directional derivatives, the 
so-called partial derivatives, which correspond to the direction u 
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being parallel to one of the three co-ordinate axes. When 
u = (1,0, 0), so that the direction is parallel to the x-axis, the 
directional derivative is denoted by 


OT 
Ox’ 


and called the ‘partial derivative of T with respect to a’. It 
is read as ‘partial d T by d a’. This is the rate of change in T 
as we increase x while keeping y and z constant. There are 
similar partial derivatives a when u = (0,1,0) and gE when 
u = (0,0,1). 


For example, if T(x, y, z) = 3%? + 24? + 2° + 2xz, then 


a Ope D 
Ox Peet. gy Ge 


= 22 + 22, 


as in each case we differentiate T with respect to the given variable 
(æ or y or 2) and treat the other two variables as if they are 
constants. 


A general directional derivative can be expressed in terms of the 
partial derivatives. If u = (u1, u2, u3) is a unit vector, then the 
directional derivative of T in the direction u equals 


It follows that the vector 


OT OT OT 
grad(T) = (Gwe) 


is in the direction where T increases fastest. This vector grad(T) is 
called the gradient vector of T, usually read ‘grad T’. 
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For the earlier function T, we have grad(T) = (6x + 2z, 4y, 2z + 22). 
So at the point (1, 0, —1), the temperature is increasing fastest 
parallel to (4, 0, 0), which has direction (1, 0, 0). 


Recall Fermat’s theorem: if a function f(x) is maximal or minimal, 
then f'(x) = O. In a similar manner, if T (x,y,z) is maximal or 
minimal, then 


a = A = A = 0 or, equivalently, grad(T) = (0,0,0) = 0. 
For the given example, grad(T) = O at (x, y,z) = (0, 0, 0), which is 
actually a minimum for the function T. In the Appendix we show 
how the least-squares error E (a, b) from Chapter 4 can be 
minimized to find the best-fit line. 


As with ‘full differentiation, second (and higher) partial 
derivatives can be formed by repeated partial differentiation. 
Under mild technical requirements, the order of differentiation 
won't matter; for example, using the earlier function T(z, y, z), 
we obtain the same answer whether we differentiate with respect 
to z first then x or vice versa: 


i NERA eer 
dx0z Ox \ Oz On i 
OT ð (OT o 

ðzðx Oz (5) To Te 


Partial differentiation equations 


We saw in Chapter 3 that the development of calculus was 
intimately linked with differential equations. In that chapter we 
were interested in ordinary differential equations (ODEs)—that 
is, ones involving a function f(z) of a single input x. As examples, 
the ODEs 
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Yo and -5 =0 


have general solutions 
flæ)=c and f(x) = cæ + c, 


wherec, cı, and c are constants (that is, real numbers). Provided a 
version of Picard’s theorem applies, the general solution to an order n 
ordinary differential equation involves n arbitrary constants. 


In a certain sense, the above also applies to partial differential 
equations (PDEs)—differential equations involving partial 
derivatives. The first-order PDE 


af 
oe 


involving a function f(x, y) of two variables x, y, has the general 
solution f(a, y) = c(y). Here c(y), rather than being an arbitrary 
constant, is an arbitrary function of y. As the partial derivative 
Of /Ox equals 0, f (x, y) remains constant as x varies or, put another 
way, f (x,y) can vary only with the other variables; here the only 
other variable is y. 


Likewise, the second-order PDEs 


Os BF g Pf _ Pf 


xo ðxðy Ox? Oy? 


respectively have solutions 


where cı and cs are arbitrary functions. 


The study of PDEs is a substantial field of mathematics, with such 
equations found ubiquitously across science. We shall meet some 
important PDEs in the next two chapters. 


The calculus of variations 


A good deal of calculus is about optimization—what is the least or 
greatest that something can be. When that something depends on 
only one input, Fermat’s theorem tells us that the derivative has to 
equal zero. When there are several inputs, the partial derivatives 
need to be zero, but some optimization problems require more 
complicated mathematics than this. 


For example: what is the curve of shortest distance between two 
points in the plane? The correct answer is the line segment connecting 
the two points. But it takes considerable thought to understand what 
is being claimed here and what needs proving. 


By Pythagoras’ theorem, the two points (0, 0) and (1,1) in 

the plane are distance \/2 apart, this being the length of the line 
segment between them. Our claim is that this distance is less than 
the length of any other curve connecting the two points. 


Several such curves are sketched in Figure 32. For a general curve 
y =f (x), between (0,0) and (1,1), its length equals 


K i+ (Fæ) az. 


To each such curve we have assigned an integral and we seek to 
find the smallest of all these integrals; here we claim this to be 
when f (x) = x, as y = x is the equation of the line connecting the 
points. This is still clearly an optimization problem, but 


89 


Ayuajde suoisuawig 


Mathematical Analysis 


(LD) 


d=n/2~1.571 


(0,0) 


32. Curves between the points. 


mathematically one of much more sophistication than any 
previously met. 


Such an integral is minimal when F(æ,f,f'), the function being 
integrated, satisfies the Euler-Lagrange equation, 


ala) F 


In the given example we have F(x, f f') = 4/1 + (f’)”, and it’s 
relatively straightforward to show that f (x) = æ satisfies the 
Euler-Lagrange equation in this case (see Appendix). The more 
general case for curves between any two points can be similarly 
demonstrated. 


Further, the Euler-Lagrange equation can be used to find the 
shortest curves on surfaces, known as geodesics—for example, the 
great circles on a sphere are geodesics (a great circle cuts a sphere 
into two hemispheres)—and light travels along geodesics in the 
theory of general relativity. 


These problems are typical of the calculus of variations. A famous 
early question was the brachistochrone problem: given two 
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33(a). A cycloid. 33(b). A solution to the brachistochrone. 


see 


points, one higher than the other (Figure 33(b)), what is the shape 
of a smooth wire, connecting the two points, so that a bead takes 
the shortest time possible, sliding from the higher point to the 
lower under gravity? The problem’s name ‘brachistochrone’ derives 
from the Greek for ‘shortest time’. 


This problem was posed in 1696 by Johann Bernoulli, who himself 
knew the answer. He received correct solutions from his brother 
Jacob, Leibniz, and Newton. The answer is not, as you might guess, 
the straight line between the two, but rather an upside-down 
cycloid. 


The cycloid is a curve of geometric interest, as it is the curve traced 
out by a fixed point P on the circumference of a rolling circle 
(Figure 33(a)). The solution to the brachistochrone problem is the 
upside-down cycloid beginning vertically and passing through the 
two points, and this problem can be solved using the 
Euler-Lagrange equation. In this case the integral associated with 
each function is the time taken for a particle to move along a 
smooth wire in the shape of the function’s graph. 


A similar problem is the isoperimetric problem: given a loop of 
string, what’s the largest region you can encompass with it? You 
might guess the answer is a circle, and you'd be right—any 
optimizing curve to be a circle. But this wasn’t proven until 1870 
by Weierstrass. Specifically, he proved the isoperimetric 
inequality: if the string has length L, and bounds a region of 
area A, then 
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L? >4nA, 


with L? = 47A holding only for circles (when L = 27r and A = mr?, 
where r is the circle’s radius). So given string of length L, the greatest 
area it can bound is L?/(471). 


The same question can be posed in higher dimensions, and in one 
dimension higher the optimizing surfaces are spheres. This is the 
reason that soap bubbles are spherical. Because of energy 
considerations, the soap bubble seeks to reduce its surface tension, 
and so area, while containing the same volume of air. Likewise, 

a soap film spanning a wire boundary seeks to minimize its area, 
and hence the surfaces that soap films make are examples of 
minimal surfaces. For example, given two parallel circular 
boundaries, the soap film forms a catenoid (Figure 34). 


In 1760, Lagrange first posed the following problem: given a 
bounding wire, is there a minimal surface having that boundary? 


34. A catenoid. 
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Physically, this seems intuitively clear, as it’s whatever form a soap 
film would take spanning the wire; the problem is now known as 
Plateau’s problem after the physicist Joseph Plateau, who 
investigated this problem using soap films. However, a 
mathematical proof of the existence of smooth minimal surfaces 
for a general boundary involved surmounting significant technical 
difficulties and was not completed until 1970. 


Multivariable integration 


Just as there is a differential calculus of several variables, so there is 
an integral calculus too. In one variable, integration is a means of 
rigorously defining a sum of infinitesimal contributions and 
calculating (signed) areas under a graph. But integrals can 
represent other quantities: as examples, if we integrate a velocity, a 
displacement is evaluated, and integrals can also represent 
probabilities. 


If a function f (æ) is positive, then the integral ir (a) dx 
represents the area below the graph y = f(x) and above the interval 
a<gx <b. Given a positive function f (x, y) of two variables, its graph 
z = f (a, y) is a surface above the xy-plane, and it seems reasonable 
that some integral should represent the volume under this surface 
and above a region R of the plane. This volume is denoted by 


[rena 


R 


(Figure 35(a)). Previously, we informally thought of f (x) dx as an 
infinitesimally thin rectangle of height f (x) and base dz. Now 
f(x,y) GA is the volume of a prism, of height f(z, y) and 
infinitesimally small base of area dA in the xy-plane. 


If the set R is a complicated region, then subtle techniques may be 
needed to calculate this volume, but when R is a rectangle—a 
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(a) (b) 


> XL 


a b 


35(a). Volume under a surface above a square. 35(b). Rectangle. 


natural generalization of an interval—such volumes can be readily 
evaluated. If R is the rectangle 


a<u<b, and c<y<d 


(Figure 35(b)), then the integral can be determined as the 
‘repeated’ or ‘double’ integral 


[fren ay) ae 


a=a ‘J y=c 


As an example, the volume shown above (Figure 35(a)), which lies 
under the surface z = x” + y? and above the square 0 < x,y, <1, 
equals 


i ({ (2? +4’) dy) de. 


x=0 ‘J y=0 


To evaluate the middle y-integral, we treat 2 as a constant and 
note that vy + y*/4 is an antiderivative of x? + y?. By the 
fundamental theorem, 


a d 2 1* 2 ot 2,1 
+ = 1+ O+ =% +. 
j RG y’) dy (z x ) G x ) x 
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This has an antiderivative x? /3 + 2/4, and so, by the 
fundamental theorem, the volume is 


xv=1 3 3 
| (2+3) a E5 (S2) -i 7 
pco 4 34 3'4) 3'4 12 


Volumes aren’t the only things represented by such integrals. 


If the above square were made of a material with density 

f(x,y) = x? +y’, then Z would be the mass of that square. When 
f(æ,y) = 1, then IR JS (x,y) dA equals the area of R. If the region R 
were a disc representing a dartboard, and f (x,y) dA denoted the 
infinitesimal probability of a dart landing at a point (æ, y), then 
I, f(x,y) dA would be the probability of the dart player hitting the 
dart board—we would expect this to equal 1 (representing 
certainty), or just below 1, given the occasional miss. If s(x, y) 
were the score received for a dart landing at (x, y), then 

fse y) f(x,y) dA would be the average score achieved with 

a dart. These ideas extend to three and higher dimensions. 


Note there is a natural order when we sum an integral j f(x) dz; 
we think of x as increasing, varying from æ = a up to æ = b. With a 
rectangular region R, there are two natural ways to cover the 
rectangle: either vary y first and then 2 (as we did above) or vice versa. 
For more complicated regions, there may not seem any preferential 
way to let (x, y) vary over R. In some instances, the order of integration 
can matter, as Riemann showed for infinite sums. But if the modulus 
| f (æ, y)| is integrable on R, then the integral I, f(x,y) dA always 
gives the same value, irrespective of the order in which it is summed. 
In 1837 Dirichlet demonstrated the same for infinite sums: if the sum 
of the modulus of the terms converges, then the infinite sum always 
gives the same answer, no matter how it’s rearranged. 


Surface integrals and flux 


A multivariable integral can represent the area of a region in 
the plane, but we might wish to calculate the area—or surface 
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ax 


36. Latitude and longitude. 


area—of a non-planar surface, such as a sphere or a cone situated 
in three-dimensional space. You may even know such 

formulae already—for example, the area of a sphere of radius a 

is 4na?. 


Our first problem is how we might describe a surface like a sphere, 
and one way to do this is to assign co-ordinates to the surface. For a 
sphere we might use latitude and longitude. We can measure the 
angle 6 of latitude from the north pole, where 6 = 0; past the 
equator, where 6 = 7/2; or to the south pole, where 6 = n 
(Figure 36). Similarly, we can measure the angle ¢ of longitude 
from the prime/Greenwich meridian, going once around the world 
so that g = 27 when we return to Greenwich. (Note we are again 
using radians here, and the ranges of 6 and ¢ are different to those 
traditionally used on a map or globe.) 


The area of a planar region R is fya Informally, we can think of 
this as a sum of infinitesimal rectangles of area dA = dæ dy, where 
the point (a, y) ranges over R (Figure 35(b)). To determine the 
surface area of the sphere, we need to sum infinitesimal elements 
dS of surface area, as the co-ordinates range over all possible 
values. However, it is not simply the case that dS = d9 dọ ona 


sphere, as dS is no longer the area of an infinitesimal rectangle. 
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Certainly, we should expect dS to depend on the sphere’s radius—it 
should be proportional to a”, as this is how the sphere’s area 
scales with its radius. But dS may also depend on the co-ordinates 
6 and ọ. The shaded cap shown in Figure 36 is the region given by 
0<6<a. If we increase a to a + dé, then the shaded area grows at 
different rates depending on the value of a, even when using the 
same increment dé. If we're near either pole, then the area barely 
increases, but near the equator it increases the most, with the extra 
area making a band around the sphere at its ‘fattest’. By 
comparison, the symmetry that the sphere has about its north- 
south pole axis implies that an increment in ¢ has the same effect, 
irrespective of where we are on the sphere. 


The correct formula for an infinitesimal element of surface area is 
dS = a’sin(6)dé dg. 


Note that when we are at the poles—that is, 8 = O or 6 = 1—then 
sin(@) = 0, and when we are at the equator—that is, 9 = 2/2—then 
sin(@) = 1, reflecting the different increments in surface area 

that the same change in 6 can produce. We can now determine 
the surface area of the shaded cap as 


6=a pp=27 
| | a’ sin(6)dy dé 


6=0 Jg=0 


aœ ( cos(a) — ( cos(0))) (27 — 0) 


= 2na? (1 — cos(a)) 


recalling that —cos(@) is an antiderivative of sin(@), and using the 
fundamental theorem. At the latitude of the south pole, the cap 
becomes the whole sphere; there « = n and cos(a) = —1 so that we 
obtain 47a? as expected. 


The above calculation leads naturally to a discussion of solid 
angle. In the plane, the angle between two half-lines equals the arc 
length of the unit (= radius 1) circle between the half-lines. This is 
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87(a). Plane angle. 37(b). Solid angle in a cone. 


provided we are using radians to measure angles (Figure 37(a)). In 
a similar fashion, we can define the solid angle at the apex of a cone 
as the surface area of the unit sphere that the cone cuts out 
(Figure 37(b)). If the semi-angle in the cone is a, then the 
previous calculation shows that the solid angle is equal to 

2n(1 — cos(c)). 


The unit of solid angle is the steradian. In the same way that the 
unit circle having length 27 means that there are 27 radians in a 
whole angle, the unit sphere having area 47 means that there are 
47 steradians in a whole solid angle. 


We can now work out the solid angle when looking at the Sun or 
Moon. Each can be modelled as a sphere, with radius R at distance 
D from the observer (Figure 37(b)). We see that 


cos(a) 


ao) 


D D 
The approximate values of R and D for the Sun and Moon are: 


Rsun 


Ren =7X10°km, Dsun=1.5x 10km, =4.67x 10°; 
sun 
R 
Rmoon=1.8x 10km, Doon =3-.8X10°km, —®2-—4.73x107. 
Dmoon 
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Consequently, the Sun and Moon occupy much the same portion of 
the sky, despite being considerably different in their actual sizes. 


It’s somewhat more complicated to calculate the solid angle made 
by a more irregular object. If you look around you at an object, and 
envisage a sphere of unit radius centred on one of your eyes, then 
the solid angle made is how much of that sphere is taken up by your 
view of the object. The surface of the object you’re looking at is 
made up of infinitesimal elements dS of surface area; however, the 
same amount of surface area will obstruct differing amounts of 
your vision depending on how far away the object is and how 
obliquely you are looking at it. 


If you were looking at part of a sphere with area S, all of which was 
distance r away, then it would make a solid angle S/r?. The scale 
factor of 1/r? represents how your view of an object diminishes as it 
recedes; equally, 7? represents how a sphere’s area grows as its 
radius r increases. For most objects, though, different parts of an 
object are at different distances from your eye, so we would need to 
use an individual value of r for each part. In addition, the same 
surface area, at a certain distance away, can obstruct more or less of 
your view depending how full-on it appears to you. A sheet of paper 
might be almost invisible if close to being on its side, in contrast to 
when looked at fully. Putting all this together, the solid angle of an 
irregular object is defined to be the integral over the viewed surface 
of the obstructive component of each scaled element of area dS/r?. 


Solid angle is an example of a flux integral. Flux is an important 
notion in fluid dynamics, thermodynamics, and electromagnetism, 
and is a measure of the rate at which a quantity passes through a 
surface. 


For example, suppose that a fluid has constant velocity v when 
moving through a pipe with cross-sectional area A. Then the 
amount of fluid passing the cross-section, or flux, is vA per second. 
Here, the fluid is moving perpendicularly to the cross-section, but 


99 


Ayuajde suoisuawig 


Mathematical Analysis 


38. Flux. 


if the fluid moves obliquely to the boundary (Figure 38), then the 
flux crossing an area A equals vAcos(a). Note that when a = 0, the 
flux equals vA, as the fluid is moving perpendicular to the 
boundary, while when a = 7/2, the flux equals 0, as the fluid is 
moving parallel to the boundary, rather than through it. 


This expression vcos(a) is commonly denoted by v « n, where n is 
the unit vector perpendicular to the surface and v is the velocity 
vector of the fluid. (Some readers may recognize this as a scalar 
product.) More generally, the velocity v of a fluid is not constant and 
nor is the perpendicular unit vector n to a surface constant, so the 
flux across a general surface X equals a sum of the flux across 
infinitesimal elements dS of surface—that is, the flux equals 


|] v-nas. 


Line integrals and work 


Much of what follows in this chapter first interested applied 
mathematicians and physicists. The nature of these integrals gives 
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gravity doeg work 
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39(a). Work done by gravity. 39(b). A whirlpool v(x, y) = (—y, æ). 


-2 =l 0 1 2 


a sense of their physical origins; for example, line integrals are 
perhaps best introduced by thinking of the work done by a force. 


Gravity acts around us in a (relatively) uniform, constant way—we 
model acceleration due to gravity as the vector (0, 0, —g). That is, 
gravity acts downwards with constant strength g, which is 
approximately 9.81ms~?. By Newton’s second law, the 
gravitational force on an object of mass m equals F = (0, 0, —mg). 


If we lower that object by a height of h, then gravity has done work 
and the object has lost gravitational potential energy in the amount 
of mgh. An important point here is that the energy lost equals mgh, 
irrespective of which path is taken between the start and end of the 
journey (Figure 39(a)). By contrast, we can look at a vector flow 
representing a whirlpool at the origin (Figure 39(b)). Here, it’s clear 
that if you swam between the start and end points labelled, different 
amounts of work would be done depending on whether you swam 
with the flow (anticlockwise) or against it (clockwise). 


If an object were moved by a constant force F in a straight line 
distance d, the work done would equal Fd. More generally, the 
force F would not be constant, and nor would the path taken be a 
straight line. For example (Figure 39(a)), while the top path is 
horizontal no work is being done, because gravity has no 
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component in the direction of travel; if the path is vertical the full 
force of gravity is acting, and at other points some component of 
gravity is acting. 


For a general motion, if we move an infinitesimal amount, 
dr = (de, dy, dz), 


in three-dimensional space and in the presence of a force 
F = (Fz, F}, Fz), then the infinitesimal work done by the force is 
denoted F + dr and equals 


Fedr = F,dx + F,dy + F,dz. 


In this expression, the horizontal component of the movement 
(dx, 0,0) picks out the horizontal component of the force 

(Fz, 0,0), and likewise for the other co-ordinates. At the points 
where the path is horizontal, dz = 0, and so F - dr = O when F is 
the gravitational force, as F, = Fy = O as well (Figure 39(a)). The 
total amount of work done, as we move along a curve C in a certain 
direction, is denoted by 


| Fedr, 
c 


and equals the sum of all the infinitesimal work contributions as 
we make the journey along the curve C. If we move along C in the 
opposite direction, the work done changes signs. 


For gravity F = (0,0, —mg) we have F + dr = —mgdz and —mg has 
antiderivative —mgz. So, for any curve C starting at a point P which 
is height A above the finishing point Q, 


Q 


| Bedr=—] mg de = -mg(e(Q) - 2(P)) = -mg(-h) = mgh, 
C P 
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by the fundamental theorem. What makes the gravitational force F 
special in this regard is that F is conservative, which means that 
F = grad(f) is the gradient vector of a function f called a 
potential. When F = grad(f) is conservative, we have 


Q 
| rad(f) ar =f(Q) —f(P), 


which is essentially just another version of the fundamental 
theorem. It says that the work done by a conservative force 
along a curve C is just the change in potential as we move from 
P, the start of C, to its end Q. For gravity, F = grad(f) where 


f = —mgz. 


When the path is a loop, so that P and Q are the same point, no 
work is done, which is why such fields are termed conservative. 
Most fields aren’t conservative: for example, the ‘whirlpool’ seen 
previously (Figure 39(b)) isn’t, because different amounts of work 
are done going around the two semi-circles even though they have 
the same starting points and endpoints. 


Stokes’ theorem and the divergence theorem 


A vector field v in two or three dimensions can represent the 
velocity of a fluid. For such flows, there are two further important 
properties: their curl and divergence. 


The divergence div(v) of a vector field v = (vy, vy, vz) in three 
dimensions is given by 


_ 00x vy Ov 


div) = ae) ayo Oa 


(In two dimensions the formula involves just the first two terms 
on the right-hand side.) The physical meaning of divergence 
relates to the expansion or contraction of the fluid’s flow. The 
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vector field v = (x, y), shown (Figure 40), has divergence 

div(v) = 0x/0x + Oy/Oy = 1 + 1 = 2. Also drawn is the progress of 
a disc of fluid as it expands with time ¢ while following the flow v. If 
A(t) is the area of the disc at time ¢, we have 


a 2A. 
The number 2 in this differential equation is the divergence of the 
flow. More generally, the divergence isn’t constant but measures 
the rate of local expansion (if positive) or contraction (if negative) 
of the fluid at a particular point and time. The condition that 
div(v) = O means that a fluid is incompressible, as is the case with 


y 
rk eh # FD 
Re ae Pe A 
SNA NER PAZ H 

i SAANA A TP A A A 
BORER ARANA POE PA a 
(0) 
-1 
y Y \ \ 
-2 , 
-2 -1 (0) 1 2 


40. v(x, y) =(a,y) with div v= 2. 
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the rotating flow seen previously (Figure 39(b)). Incompressible 
fluids are used in hydraulic machinery; the pressure applied to an 
incompressible fluid is transmitted with very little loss and can be 
directed along pipes and around corners in a versatile way. 


The divergence theorem is a higher-dimensional version of the 
fundamental theorem of calculus. It states that 


I div(v)dV = [e en dS, 


where: 


* v represents a fluid’s velocity; 


e R is a bounded region of three-dimensional space, such as a solid 


sphere or cube; 


e ØR is the surface of R, i.e. the spherical shell or six faces of the cube 


for the above examples; 


e ven is the rate of fluid leaving R across the boundary OR per unit 


area. 


Given an infinitesimal volume of fluid dV, div(v)dV is the rate at 
which it is expanding or contracting, so the triple integral on the 
left of the divergence theorem is the sum of all these expansions 
and contractions inside the region R. On the right-hand side is the 
total flux of the fluid across the boundary of R. So, the divergence 
theorem, physically interpreted, states that the overall rate of 
expansion and contraction inside a region equals the rate of fluid 
leaking out and in across the boundary. 


A physical interpretation of curl is comparatively subtle, and the 
detailed formula for curl is also rather involved, so is left to the 
Appendix. As an example, though, imagine the xy-plane rotating 
about the origin (Figure 41). If w is the angular speed—meaning 
that the rotation happens at a rate of w radians per second—the 
velocity of the point (x, y) is given by v = (—wy, wx) and 
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41. Rotation around the origin. 


curl(v) = (0, 0, 2w). So here, the magnitude of curl is twice the 
angular speed, and its direction, (O, 0,1), is parallel to the axis of 
the rotation, in this case the z-axis. 


For a general fluid flow, curl(v) is a measure of the flow’s vorticity, 
and it’s important to recognize that this is a measure of local 
behaviour. In the previous example the whole plane is rotating 
with the same angular velocity. More generally, for a small 
element of fluid moving in a flow v, the direction of curl(v) is 
parallel to the element’s axis of rotation and the magnitude of 
curl(v) is twice the angular speed of the element’s rotation. A fluid 
flow with zero curl is called irrotational. 
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42(a). A shear flow. 42(b). Irrotational whirlpool. 


Note that all points in Figure 41 are rotating with angular speed w; 
this can be appreciated by noting how the small crosses ‘+’ are each 
rotating. The turning of these crosses is what’s important, not the 
fact that points are moving globally in circles; the turning 

crosses represent how the flow is spinning locally. The diagram on the 
left (Figure 4.2(a)) shows a shear flow. Globally, particles move in 
straight lines, but they themselves are spinning while doing so—as 
exhibited by the spinning crosses—and the curl is non-zero. By 
contrast, the diagram on the right (Figure 42(b)) illustrates a flow 
with zero curl but with particles moving globally in circles, without 
spinning while doing so, as exhibited by the crosses’ behaviour. 


Again, there is a higher-dimensional version of the fundamental 
theorem which involves curl. This is called Stokes’ theorem, 
which states that 


| curl(v) «ndS = | vedr, 
R aR 
where: 


e v represents a fluid’s velocity; 
e R isa surface situated in three dimensions, such as a hemisphere; 
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e OR is the bounding curve of R, so a circle when R is a hemisphere; 
e nis a unit vector, perpendicular to the surface OR. 


A physical interpretation of Stokes’ theorem is possible, but again is 
subtle. Curl measures twice the local spin of a vector field. We might 
visualize the curl of the field as spinning cogs on the surface R 
intermeshing with one another. The flux integral of the left-hand side 
of Stokes’ theorem is the sum of all the cogs’ contributions. No 
internal cog is driving the others; they are just simultaneously 
moving together, and no work is being done. However, for the cogs 
on the boundary, half their contact is with internal cogs but they have 
no contact on the other side of the boundary. If there were a 
caterpillar track on the boundary, then the external cogs would be 
doing work driving that track around, and this is what is measured by 
the work integral on the right-hand side of Stokes’ theorem. This is 
captured somewhat simplistically in the diagram below (Figure 43), 
where we see all the contributions cancel on the internal edges. 


43. Interpreting Stokes’ theorem physically. 
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Stokes’ theorem is named after the Irish physicist George Stokes, 
though it should more properly be named after William Thomson 
(later Lord Kelvin), who wrote to Stokes with details of the 
theorem in 1850. Stokes’ name became attached to the theorem 
after he posed it as a University of Cambridge examination 
question in 1854. The special case of R being a region of the 
ay-plane is known as Green’s theorem, named after George 
Green, who earlier stated his theorem in An Essay on the 
Application of Mathematical Analysis to the Theories of Electricity 
and Magnetism in 1828 (see Appendix). Special cases of the 
divergence theorem had been known to Lagrange and Gauss, but it 
was Mikhail Ostrogradsky who gave the first complete proof in 
1826 while studying heat flow. 


A final point of note is that the curl of a conservative vector field 
is zero. This is a relatively simple matter of algebra (see 
Appendix). More interesting is the converse question: if curl(v) 
is zero, is v conservative? Surprisingly, the answer is positive 

or negative depending on the shape of the region that v is 
defined on. 


The answer is positive if v is defined on the whole xy-plane or 
whole three-dimensional ryz-space. However, the flow 


-y x 
v(x, y) = (ow <3) 


(Figure 42(b)) is an example of a vector field with zero curl which 
is not conservative—that is, no potential exists. Note here that v 
is not defined at the origin where æ = y = 0; so v is defined on a 
punctured plane, a plane missing a point, and a region of 
crucially different shape which is not simply connected 
(informally, it has holes in it). For simply connected regions, 
the converse is true. We are now straying into the field of 
differential topology, which is where the study of these 
theorems properly lies. 
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The fundamental theorem, Stokes’ theorem, and the divergence 
theorem are one-, two-, and three-dimensional versions of the 
same theorem. This was made explicit by Elie Cartan, who proved 
the generalized Stokes’ theorem in 1945. This modern form of 
Stokes’ theorem has an incredibly succinct statement, namely 


| do =| w, 
M aM 


though it would take considerable effort to explain it in full. Here 
Mis a manifold—which is a higher-dimensional equivalent of a 
curve or surface—which has boundary 0M. 


44. grad(g) on g=1. 
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There is a large amount of theory relating to analysis on such 
spaces. One aspect relates to constrained optimization problems; 
for example, what is the maximal value that f(x, y,z) = 2a + 2y + z 
takes when g(x, y,z) = £? + y? + 2? = 1? The constraint g = 1 
means that (x,y,z) lies on a sphere, so we are asking what is the 
largest value that f attains on that sphere. The greatest value is 3 
which is achieved when (a, y,z) = (2/3, 2/3, 1/3). Note at this 
point that grad( f) = (2,2,1) and 


grad(g) = (2x, 2y, 22) = (4.2.2) = Zerad(f) 


The vectors grad( f) and grad(g) are parallel; this is not a 
coincidence, but rather what the theory of Lagrangian multipliers 
states will occur at such a maximum or minimum. This is 
because the gradient vector grad(g) is everywhere perpendicular 
to the surface g = 1 (Figure 44), whilst grad( f) is perpendicular to 
the surface at a maximum or minimum off (in a result akin 

to Fermat’s theorem). 


The huge field of differential geometry is essentially the application 
of multivariable calculus to the geometry of curves, surfaces, and, 
more generally, manifolds. The topology of a space, essentially its 
shape, can have important global implications for analysis on that 
space. See, for example, Topology: A Very Short Introduction. 


111 


Aquajde suoisuawig 


Chapter 6 
PII name that tune in... 


The wave equation 


An early mathematical model employing calculus was the wave 
equation, derived by the French mathematician and music theorist 
Jean le Rond D’Alembert. This equation governs how a taut 
string—such as a guitar or piano string—makes small vibrations, 
but it is also important in the study of acoustics more generally, 
electromagnetism (including light), and fluid dynamics. 


There are various ways to make a string vibrate. In a harpsichord, the 
string is plucked and released; in a piano, a string at rest is struck by 
a ‘hammer’. How can such vibrations be described mathematically? 
At a particular instant of time ¢, a vibrating string will make a certain 
shape, which we can represent as the graph of a function y(x). Here 
x is the co-ordinate of a point along the string, and the value y(æ) is 
the sideways—or transverse—displacement of that point, measured 
from the equilibrium position y = 0. The notion of displacement 
here is similar to that of distance, but because the string might be 
above or below the horizontal equilibrium, y can be positive or 
negative. Combining all this means that we need a function, y(z, t), 
of two variables, x and t, to fully describe the string’s position. 


This will be sufficient to describe any transverse vibration of 
the string, meaning that a point of the string with horizontal 
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45(a). A plucked string initially. 45(b). A struck string initially. 


co-ordinate x maintains that co-ordinate, moving only in an 
‘up-and-down’ fashion. This is a reasonable assumption for a taut 
string, making small vibrations, but would not apply if we had a 
loose string. The transverse velocity of the string is given by the 
partial derivative 2y, this is the velocity at which a particular point 
of the string is moving up or down. We will assume that the string 
has a constant tension T throughout and a uniform density p. 


The initial position y(æ, 0) of a harpsichord string at time t = O is 
graphed (Figure 45(a)). Here 2y (æ,0) = 0, signifying that the 
string is initially at rest. By contrast, a piano string begins in the 
equilibrium position y(x, 0) = O (Figure 45(b)), but the string is 
initially struck by a hammer, so a part of the string, —a <a <a, has 
an initial transverse velocity $y (@,0) =v. 


In 1747 D’Alembert derived what is now known as the wave 
equation, which states that 


Any small transverse vibration of the string is a solution y(x, t) of 
the wave equation. 


The wave equation models how these two initial states evolve with 
time (Figures 46(a), 46(b)). For a plucked string, half the initial 
pluck moves to the right and half moves to the left. Both halves move 
with speed c = JT/p. For a struck string, a central portion of the 
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46(a). A plucked string propagates. 46(b). A struck string propagates. 


string eventually moves through a distance va/c and then stops, 
with the front and rear of the wave moving right and left, again with 
speed c. 


The wave equation is a partial differential equation, as discussed in 
Chapter 5; further, it is a second-order equation, so we might 
expect the general solution to include two arbitrary functions. 
D’Alembert solved the wave equation for an infinite string, showing 


y(@,t) = f(a — ct) + g(x + ct), 


where f and g are arbitrary functions. The quantity c is the speed at 
which a wave propagates along the wire. So, if the tension T is 
greater, waves propagate faster and, if the density p is greater, the 
waves move more slowly. As all small vibrations satisfy the wave 
equation, we need further information, such as the initial position and 
initial motion of the string, to work out specifically what f and g are. 


A solution of the form y(a, t) = f (æ — ct) consists of the graph of 
y =f (x) moving right along the string at speed c; to appreciate 
this, note that æ + ct is a point moving right with speed c, starting 
at x, and f (x — ct) takes the same value at (a + ct, t) as it did at 
(@, 0). Similarly a solution of the form y(z, t) = g(a + ct) consists 
of the graph of y = g(x) moving left along the string at speed c. The 
general solution therefore comprises two waves moving left and 
right with speed c. 


Historically, however, there was still an issue with what should be 
understood by an ‘arbitrary function’. The initial starting position 
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of y(a, 0) for the plucked string is physically reasonable 

(Figure 45(a)), but y(x, 0) wouldn’t have met Euler’s definition 

of a function being described by a single analytic expression. 
18th-century mathematicians were aware of this problem, though 
it went unresolved at the time. 


Derivation of the wave equation 


Recall that a mathematical model is a description of how the real 
world behaves. It is necessarily something of an approximation, 
and reasonable, though idealized, simplifications are made so that 
the mathematics involved is not overcomplicated. What follows is a 
brief description of the wave equation’s derivation. It is a somewhat 
technical and physical argument, so you may prefer to move on to the 
next section, but it is included here because it is typical of how the 
methods of calculus get applied to real-world models. Such models 
form the most important application of calculus. 


Models often begin by focusing on a small part of an experiment, 
and as we let that part become ever smaller, limits and derivatives 
arise in our calculations. So we begin with a very small piece of the 
string, say of length A, which runs from z to x + h. The mass of this 
small piece of string is ph. Depicted in Figure 47 is this piece lying 
flat, below its later position. 


ae > 
mass = ph p ——— | - 
E = 
“a 
velocity = dy/ot : 
y(ath, D! 


47. Piece of string initially and then vibrating. 
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The figure shows the forces acting on this piece of string, namely 
tension pulling the string to the right and to the left. We 
(reasonably) ignore gravitational effects or air resistance, as these 
forces are negligible compared with the tension T in the string. 
Because the piece of string moves only up and down—so not 
sideways—the component of the tension pulling the string to the 
right must cancel out the component pulling it left. It’s the 
difference between the up and down components of the tension at 
the two ends that makes the string move up and down. 


The gradient of the string at any point is au, from this it 

follows that the upward component of the tension on the right is 
T oy (æ + h, t) and the downward component of the tension on the 
left is T $e (x,t). The string’s mass is ph and the upward 
acceleration of the piece is 0°y/0¢?. Putting all these quantities 


into Newton’s second law, that force = mass x acceleration, gives 


Ay 
ðt” 


ðy i ðy 
TE (æ+h,t) -TEZ (æ,t) = (ph) 


If we divide both sides of the equation by h and take the limit as A 
becomes small, then we arrive at 


3y 8 y. 
TT 


this is the wave equation. For those interested in finding out 
more about mathematical models, see Applied Mathematics: 
A Very Short Introduction. 


Boundary value problems 


Clearly a guitar or piano string is not infinite and, besides being of 
finite length L, both its ends are secured. So the displacement 
y(x,t) must satisfy the wave equation 


2 O*y _ d*y 
du? dt? 


T 
c where c? = > 


116 


AA t=0 ~ 
P s 
parra ae \ 
t= L/(3e) SN 
2 L 
L 
Ala t=L/c Pa 


48. First harmonic at t = 0, L/(3c) and t = L/c. 


and also the boundary conditions 


y(0,t)=0, y(L,t) =0. 


These boundary conditions say that the start of the string x = O 
and the end of the string x = L are secured to the x-axis (where 
y = 0) at all times. One possible mode of vibration for such a 
string, which is a solution to the wave equation and its boundary 
conditions, is 


t 
ylæ, t) asin (oE), for O< <L. 


This manner of vibration is called the string’s first harmonic, here 
sketched at three different times (Figure 48). When t = 0, the 
string is at its highest and the very highest the string achieves, Aj, is 
called the amplitude; as time passes, the string moves down to be 
at its lowest when t = L/c, the mirror image of its original position; 
it then moves up, returning to its original position when t = 2L/c, 
which is the period of a single oscillation. 


The second harmonic (Figure 49(a)) is given by 


2 27ct 
y(x, t) = Ag sin( 7) eos( z, O<sg<sL. 
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(a) (b) 


= >> ~ 
LI2 AL 


y y 
A A A N i) — yo 
| 


L/3N_ 21/8 L 


49(a). Second harmonic. 49(b). Third harmonic. 


Its period is half that of the first harmonic, and its pitch is an octave 
higher than the first harmonic. 


More generally, for n = 1, 2,3, ..., the string can vibrate as follows: 


t 
Yn (a, t) =Ay sin(“™*) cos (+ ). O<a<L, 


which is known as the nth harmonic. This oscillation has a period 
that is n times shorter than the first harmonic. The third harmonic 
(Figure 49(b)) has a pitch that is a fifth above that of the second 
harmonic. 


These harmonics are very special ways in which a string can 
vibrate. In general, a string will vibrate in a manner that involves 
many, indeed infinitely many, of the harmonics. Finding out how 
much of each harmonic contributes to a particular vibration is the 
study of Fourier analysis, named after the French mathematician 
Joseph Fourier. 


Fourier analysis 


In general, the vibrations of a string y(x, t) with fixed ends are not 
single harmonics but rather are a combination of different 
harmonics. So y(x, t) can be written as an infinite sum, 


y(x, t) = yr(a, t) + yo(a,t) + ys(a,t) + ..., 


of harmonics. The function 
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t 
Yn(a@,t) = An sin (=) cos k ) 


is an nth harmonic, and our task is to find the amplitude A, of each 
component for a general mode of vibration y(x, t). Fourier showed 
that if the string is initially at rest in the shape of the graph 

y =f (x), then the amplitude A, equals the integral 


A= =i F(e)sin (™) dar 


(as derived in the Appendix). Using these values of A, in the 
formula for yn (x, t) and substituting those expressions for y, (2, t) 
in the infinite sum for y(a, t), we obtain the solution to the wave 
equation for a vibrating string starting from rest. The expression of 
y(æ,t) as a combination of harmonics is called its Fourier series. 
The values A, are called Fourier coefficients. 


As a first example of a Fourier series, we will set L = x (which 
simplifies the calculations a little) and consider a plucked string at 
rest at time ¢ = O (Figure 50(a)). 


The integrals A, for the plucked string can be evaluated, and the 
plucked string’s Fourier series turns out to equal 


4 1 1 1 

= (sin) 3 sin(3z) + z2 sin(52) z sin(7x) 4 =) 
(a) (b) 

A y 
m/2 _ 
a N 
N N 
>x Ta ~x æ 
n/2 m m/2 m 


50(a). Plucked string. 50(b). f(x) = #7(a — r}. 
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If we set x = 7/2, the above sum converges to f(7/2) = 7/2, and, 
rearranging, we would deduce 


which is correct. In fact, the above Fourier series converges to 
the plucked string function for all x in the range 0 <æ < x. More 
generally, the convergence of Fourier series turns out to be quite 
a subtle matter. 


The impact of Fourier’s work 


Fourier analysis was initially developed to model heat flow in solid 
bodies; Fourier’s seminal work The Analytical Theory of Heat was 
published in 1822, though his initial results date back to 1807. He 
derived the heat equation which models heat flow in solid bodies; 
this is another partial differential equation which can be solved 
using Fourier analysis in a manner similar to how the wave 
equation was solved. Since then, Fourier analysis—including the 
related Fourier transform—has found many applications in 
mathematics and science, in the study of partial differential 
equations and in acoustics, optics, NMR (nuclear magnetic 
resonance), and signal and image processing. 


Fourier was not the first to employ trigonometric series to solve 
partial differential equations—as early as 1753 Daniel Bernoulli 
had argued for solving the wave equation with such series. Fourier, 
though, determined formulae for the Fourier coefficients and 
recognized the generality of his methods. In fact, he was initially 
overconfident in their use, claiming (incorrectly) that any 
function—continuous or discontinuous—could be represented by 
an infinite trigonometric series. 


The convergence—or otherwise—of Fourier series then became an 
important question in analysis, with much progress being made in 
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1829 by Gustav Dirichlet in a significant memoir. If a function is 
differentiable, then its Fourier series converges to the function. Fourier 
series even handle jump discontinuities in such functions and 
converge to the average of the function either side of the discontinuity. 
However, in 1873, a continuous function was found with a Fourier 
series that did not converge to the function. 


The convergence of a Fourier series is intimately linked with the 
function’s differentiability. The Riemann-Lebesgue lemma 
shows that the Fourier coefficients necessarily converge to zero as n 
increases, but the speed of that convergence depends on the 
differentiability of the function. The coefficients of the plucked 
string (Figure 50(a)) decrease like 1/n?. Once calculated, the 
Fourier coefficients of x? (x — 2)” decrease more quickly, like 1/n? 
(Figure 50(b)). This is because the plucked string has points, such 
as the apex, where it is not differentiable, whereas z? (x — 7)” is 
differentiable everywhere. This pattern continues, with coefficients 
decreasing ever faster for functions that are twice differentiable, 
and so on. A further aspect is Gibbs’ phenomenon: Fourier series 
converge poorly at discontinuities of a function or its derivatives. 
For example, the plucked string and its approximation using the 
first four non-zero terms of its Fourier series differ most at the apex 
where the derivative is discontinuous (Figure 51). 


Beyond their applications, the investigation into their convergence 
would provide enormous impetus for mathematicians to think 


51. Gibbs’ phenomenon. 
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about functions more broadly—such as in Riemann’s theory of 
integration—and make clearer the need for rigour in analysis. 
Cantor’s work on different infinities (Chapter 1) began with his 
study of Fourier series. Fourier analysis also raised the question: 
what is special about trigonometric functions that makes Fourier 
series so powerful? Put another way, what other infinite sums of 
functions should mathematicians be considering? 


Can you hear the shape of a drum? 


A natural generalization of the wave equation governs how two- 

dimensional membranes, such as a circular drum or a rectangular 
membrane, can vibrate. For a rectangular membrane with sides of 
length L, and L» and a fixed boundary, the harmonics are given by 


z(æ, y, t) =Am sin mi ) sin ( 27) cos me gle act 
1Y, t) =Amn Li Lo L? T L? ; 


where m and n are positive integers. The period of the above 


harmonic is 
2/c 
a? 
2 n2 
VEE 
and recall that the periods for a string of length L are 24. In both 
cases, this means that if we know the periods of the harmonics 
(and speed of propagation c), we can work out the length of the 
string or the dimensions of the rectangular membrane. But is this 
generally the case? If we know the periods of a membrane’s 
harmonics, can we deduce the shape of the membrane? This is a 


difficult question which wasn’t answered until 1992 and then in 
the negative. 


Note that a rectangular drum with length 1 and width 2 has the same 
periods as one with length 2 and width 1, but two such drums that 
are rotations of one another are considered to have the same shape. 
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However there are homophonic drums with different shapes 
(Figure 52). ‘Homophonic’ here means that their harmonics have the 
same periods. Note that the two drums have the same area; this is 
not coincidental, as the area of a drum can be deduced from the 
harmonics’ periods. 


The above is an example of an inverse problem in mathematics: 
given the observable behaviour of a system, can we infer the 
underlying mechanisms that cause that behaviour? This topic is a 
massively important area of science generally where a study of the 
inner working of a system is practically impossible—for example, 
learning about the Earth’s core from studies of how seismic waves 
travel through the Earth. 


The spectral theorem 


The harmonics for a vibrating string have a relatively simple form, 
as do those for the rectangular membrane, being expressible using 
trigonometric functions. But it is a lot less clear what we might 
mean by the harmonics of a circular drum or those of even more 
exotic shapes as in Figure 52. 


(a) (b) 


52. Different homophonic drums. 
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53(a). 4x + 4ay + 4y” =1. 53(b). Slow normal mode. 53(c). Fast 
normal mode. 


Digressing briefly, consider what the curve with equation 
4a? + 4ay + 4y? = 1 looks like. One way of approaching this is to 
rewrite the equation as 


3(æ +y’ + (y -x =1, 


or as 3X? + Y?’ = 1, where X = x + y and Y = y — x. In these 
new co-ordinates, X and Y, the curve has the equation of 

an ellipse (as sketched in Figure 53(a)); sketched also are the new 
X- and Y-axes. Note that in the new co-ordinates the equation has 
no ‘mixed’ XY term and that the new X- and Y-axes are still 
perpendicular. 


The expression 4x” + 4æy + 44? is called a quadratic form in 

two variables, x and y, and such forms can naturally be 
generalized to three or more variables. In 1829 Cauchy proved the 
spectral theorem, which shows that, for any quadratic form, 
new variables X,Y,Z, ..., can be introduced to eliminate all the 
mixed terms—so only terms like X°, Y?,Z?, ..., remain—and, 
further, the new X-,Y-, Z-axes etc. will still be perpendicular. 


The spectral theorem is important and widely applicable. For 
example, for small oscillations of a double pendulum (Figure 53) 
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about its equilibrium (where both pendula hang vertically), its 
gravitational potential energy is a quadratic form in the angles a 
and $. By the spectral theorem, new co-ordinates can be found and 
special normal modes of oscillation (akin to the previous 
harmonics) can be identified. Every small oscillation would be 
expressible as a sum of these two normal modes. There is a slow 
mode when the rods move together (Figure 53(b)) and a fast mode 
when they move in opposite directions (Figure 53(c)). 


One problem is that these examples involve finite variables, whilst 
we have seen that a taut string has infinitely many harmonics. 
What is needed is an infinite-dimensional version of the spectral 
theorem, and such does exist; that theorem is within the area of 
functional analysis—essentially infinite-dimensional analysis— 
which developed in the first half of the 20th century through the 
work of Stefan Banach, David Hilbert, and John von Neumann. 


When a string’s vibration y(x, t) is expressed as an infinite sum of 
the harmonics (or normal modes) 


t 
Yn(a@,t) = An sin (=) cos (F ) ; 


the energy in the string is an infinite quadratic form involving the 


square terms A? but no mixed terms AmAn, where m and n are 
distinct. Within this setting, the string’s harmonics (which are 
functions) are still perpendicular to one another in a technical sense; 
this perpendicularity manifests in the vanishing of integrals such as 


ie sin (=) sin (=) dæ = 0, 
o 


when m and n are distinct. This is akin to the fact that two vectors 
u = (u, U2, uz) and v = (v1, V2, v3) in three dimensions are 


perpendicular if 
UV, + U2U2 + U33 = O. 
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(The expression on the left is the scalar product of u and v.) The 
previous integral is just a continuous sum equivalent of this. It was 
through using this ‘perpendicularity’ that Fourier was able to 
determine his Fourier coefficients (see Appendix for details). 


Distributions 


Reality is complicated, so mathematical models make simplifying, 
but reasonable, assumptions. Such assumptions lead to simpler 
equations that can be solved, or more easily analysed. Common 
simplifying assumptions include point charges and point masses in 
electromagnetism and gravity. A subatomic particle, with non-zero 
mass, might be modelled as a point, which has zero volume; given 
that an electron has mass 


0.00000000000000000000000000000091 kg, 


this might seem a reasonable simplification. However, it would still 
mean, as a point mass, that this particle has infinite density, 
because its non-zero mass is contained within zero volume. 


Seemingly wilder simplifications get made. For example, a proof 
that the Earth moves around the Sun in an elliptical orbit would 
treat both the Sun and Earth as point masses, despite the Sun 
having mass 


1, 989, 000, 000, 000, 000, 000, 000, 000, 000, 000 kg. 


However, this simplification is still reasonable, as it can be 
mathematically shown that the gravitational field outside a 
uniform, spherical body is exactly the same as it would be if all that 
mass were at the body’s centre. 


Still, these simplifications can lead to difficulties, particularly if we 
wish to refer to density. If we consider a point mass of mass 1 at 
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æ = 0 on the real number line, then the density of matter ô(æ) 
satisfies the following: 


oo 

ô(æx) = O when x ¥ 0, | 5(a)dx = 1. 

oo 

The first equation states there is no matter except at the origin O. In 
the second equation, the integral of the density is the total mass on 
the whole real line, which we know is 1. The problem is that no 
function 5(x) has these properties! The ‘function’ 5(x) is called the 
Dirac delta function, named after Paul Dirac, the theoretical 
physicist who introduced it in 1930 in his influential book The 
Principles of Quantum Mechanics. But at that time its status as a 
mathematical object was still unclear. 


Nowadays, the Dirac delta function is fully understood as a 
Schwartz distribution or generalized function. Laurent 
Schwartz developed the theory of distributions in the late 1940s, 
for which he won a Fields Medal in 1950. Distribution theory now 
sits within the field of functional analysis. 


In contrast with functions, which need specifying at every point, 
distributions might be thought of as having well-defined local 
averages. This makes them useful in modelling real-world 
situations. For example, when you hold an object and then let it 
drop, what can be said of the object’s acceleration? There is zero 
acceleration before the drop, and the acceleration is that of gravity 
after the drop, but what is the acceleration at the instant of the 
drop? In practice this doesn’t matter, but it would if we wished to 
describe acceleration with a function rather than a distribution. 


The acceleration here is discontinuous at the drop, changing from 
zero to the value of gravity. But astonishingly, as a distribution, 
the acceleration can be differentiated and its derivative is a 
multiple of the Dirac delta function. None of this makes any 
sense within the traditional context of calculus—differentiable 
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functions are continuous—but within the context of distributions 
it all makes rigorous sense. Schwartz didn’t just tidy up 
previously unrigorous loose ends; the theory of distributions 
proved powerful because of the richness of their calculus and the 
breadth of their applications. 


Quantum theory 


Quantum theory developed in the early 20th century to describe 
certain phenomena of subatomic particles which classical 
mechanics could not penetrate. Foremost among these 
phenomena, and what gave quantum theory its name, were 
experiments which showed that particles could have only certain 
discrete energies, or ‘quanta’. 


In quantum theory, the state of a particle is described by a wave 
function y(x, t), which satisfies Schrédinger’s equation. For the 
so-called ‘particle in a box’ model, Schrédinger’s equation takes a 
form such as the following boundary value problem: 


p h ð% 
ôt 4am 0x2’ 


where L is the length of the box, æ denotes position, t denotes time, 
m is the particle’s mass, and A is Planck’s constant. Here i = /—1 
and y(x, t) are complex, rather than real, numbers. Quantum 
theory is naturally set in the language of complex numbers, which 
we are about to meet in Chapter 7. 


As with the wave equation, there are certain states (the normal 
modes) that are special solutions to Schrédinger’s equation, each 
with a different energy. The wave function is, in general, an infinite 
sum (or superposition) of these different states, but once energy is 
measured, the wave function collapses to one of these states. The 
Fourier coefficient of each state relates to the probability that the 
wave function collapses to that state. 
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The rise of quantum theory in the early 20th century 
introduced many counter-intuitive phenomena, and it was 
mathematics, and in particular functional analysis and the 
work of John von Neumann, that helped to give the subject 
rigorous foundations. 


129 


`- ul UNZ yey) oweu ||, 


Chapter 7 
Putting the i in analysis 


Complex numbers 


Quantum theory—physics at the subatomic level—is more 
naturally described in the language of complex numbers than with 
real numbers. What follows is a description of how 
mathematicians became interested, albeit hesitantly, in these 
so-called ‘complex’ numbers. 


A need for complex numbers arose during the Renaissance as new 
ways were found to solve polynomial equations such as 


z2 — 22? —24+2=0. 


The degree of this equation is 3, this being the highest power of z. 
Our aim is to find all the solutions z. We can see that z = 1 is a 
solution because 


P-(2x ?)-14+2=0. 


You might check that z = 2 is also a solution and so is z = —1. And 
that’s all of them—three solutions: z = 1, —1, 2. Other polynomials, 
though, seem to have no solutions. For example, the degree 2 
equation 


eal 


has no real numbers as solutions. If you take a positive number, z, 
then its square 2” is also positive, and so cannot equal —1; if you 
take a negative number, then its square is also positive; finally, 

0? = 0. So no real number z squares to —1. 


The ancient Babylonians knew how to find all, if any, positive 
solutions of a degree 2 equation. But it wasn’t until the 

16th century that general equations of degrees 3 and 4 were solved 
by Italian mathematicians. However, a problem they had was that 
their methods necessitated calculations involving the square roots 
of negative numbers, even when all the equation’s solutions were 
real numbers. At this time most mathematicians were ill at ease 
with negative numbers, let alone their square roots. But if they 
imagined square roots of negative numbers to exist, and if all these 
disturbing ‘imaginary’ numbers cancelled out at the end of the 
calculation, then their methods did indeed yield correct solutions. 


If we imagine there to be a solution to the equation z” = —1, 
denoted by i, and if it otherwise adds and multiplies like other 
numbers, then we can solve polynomial equations that we couldn’t 
previously solve. For example, the number z = 3 + 2i satisfies 


2° — 6z+13=0, 


because 


2 


(3 + 2i)” — 6 x (3 + 2i) +13 
(9 +127 + 4i) — (18 + 12i) + 13 
9+12i- 4-18 — 12i + 13 = 0, 


the last line following because i? = —1. Numbers of the form 
æ + yi, where x and y are real numbers and 7” = —1, are called 
complex numbers. 


It’s a perfectly reasonable question to ask what a number like 
3 + 2i could possibly signify. A Renaissance mathematician might 
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have asked the same of a negative number—after all, you can’t 
have —30 apples. But we now habitually see negative numbers 
describing temperatures, deficit bank balances, or x-co-ordinates 
of points to the left of the y-axis. In a similar manner, particularly 
during the 19th century, mathematicians began finding complex 
numbers so useful as to put aside any philosophical qualms. In 
particular, the fundamental theorem of algebra showed that all 
the solutions of any polynomial equation are complex numbers; 
more precisely, a polynomial of degree n has n solutions among the 
complex numbers (counting any repetitions). 


Cauchy 


The worth of complex numbers would soon become even more 
apparent because of complex analysis as developed by the French 
mathematician Augustin-Louis Cauchy (Figure 54). Cauchy was a 
titan of 19th-century mathematics who prolifically made 
contributions across a wide range of topics, and his name is 
associated with a diverse range of concepts and theorems, arguably 
more so than any other mathematician. His work ranges from 
algebra—results relating to symmetry and a first version of the 
spectral theorem—through to the study of polyhedra, differential 
equations, and solid mechanics and elasticity. 


His seminal Cours d'Analyse of 1821 on real analysis included early 
use of ¢-6 arguments— Bolzano has priority in this, though his work 
went unnoticed—but Cauchy is most remembered for his almost 

single-handed development of complex analysis during the 1820s. 


Complex analysis is one of the most harmonious, yet applicable, 
of mathematical subjects. That the complex numbers should prove 
so powerful may seem surprising at this point. Cauchy took the 
ideas of differentiation and integration for real functions and 
generalized them, in fairly natural ways, to functions with complex 
numbers for inputs and outputs. It turns out that the analysis of 
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54. Cauchy. 


such functions is far richer and more powerful than is the case with 
real functions. 


This may only seem interesting in the abstract. However, a 
problem involving real numbers, which ostensibly has nothing to 
do with complex numbers, might be generalized and extended to a 
problem involving complex numbers. At that point a wealth of 
theorems becomes applicable. This is what the French 
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mathematician Jacques Hadamard meant by: ‘The shortest path 
between two truths in the real domain passes through the complex 
domain.’ This is a powerful, general approach in mathematics: take 
a problem out of its natural or initial setting and pose it as a related 
problem in a setting where more theory can be applied. The actual 
answer we are looking for is typically still a real number, but we 
may find the desired answer more easily as the real or imaginary 
part of a complex number. 


The complex plane 


The real numbers are commonly represented on an infinite real 
number line with numbers increasing as we move from left to right. 
However, complex numbers, which have the form z = æ + yi, are 
two-dimensional in nature: 2 is called the real part of z and y is 
called the imaginary part of z. So it makes more sense to represent 
complex numbers using the wy-plane and having, say, the point 
with co-ordinates (1,2) represent the complex number 1 + 2i 
(Figure 55(a)). 


When represented in this way, the plane is referred to as the 
complex plane. (The complex plane is also commonly referred to 
as the Argand diagram, after Jean-Robert Argand, who in 1806 


(a) (b) 
A Im A Im 
27 °14+27 z=1+2i 
Re Re 
t } } r — 


55(a). The complex plane. 55(b). Multiplying by i. 
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represented complex numbers so, though Caspar Wessel did this 
earlier, in 1799.) Note that the complex plane includes the real line; 
the z-axis contains numbers of the form æ + Oi = a, which are just 
the real numbers, so it is referred to as the real axis, and the y-axis 
is referred to as the imaginary axis. 


It may seem that the complex plane is just another version 

of the real xy-plane, but it has an important extra feature that 
is key to how different the analysis, geometry, and algebra of 
the complex plane are from those of the real plane. In the 
complex plane we can multiply by i (and there is no equivalent 
natural feature in the real plane). If z = æ + yi, then 

iz = —y + ai. Note that the point iz is at the same distance 
from the origin as z and is at a right angle anticlockwise 
around from z (Figure 55(b)). This means that the complex 
plane comes with an orientation, a natural sense of 
anticlockwise, which the real plane does not. Put another way, 
the real plane is essentially an infinite sheet of paper with axes 
drawn on it; by comparison, the complex plane is an infinite 
sheet of paper with axes and a label saying “THIS WAY UP’. 
This orientation has important consequences for which 
transformations of the complex plane are of natural interest in 
complex analysis. 


Two maps of the complex plane 


Consider the complex function p(z) = 2°. For z =a + yi, 


plz) = (x + yt)? = x? + Qiay + Py? = (x? — yY?) + 2ayi. 


If we write u(x, y) and v(x, y) for the real and imaginary parts of 
p(« + yt), then 


u(x, y) =a? —y?, v(a,y) = 2ay. 
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(a) (b) (c) 
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56(a). The unit square. 56(b). Its image under p(z) = x”. 56(c). Its 
image under q(z) = 2. 


Our second map is called complex conjugation. Given a complex 
number z = 2 + yi, its conjugate is q(z) = Z = x — yt. Note that Z 
is the reflection of z in the real axis, so q(z) flips the complex 
plane and its ‘THIS WAY UP’ label. In terms of real and imaginary 
parts, we have 


u(æ, y) =a, v(x, y) = y. 


It is possible to see the effects that p(z) and q(z) have on the unit 
square (Figure 56). In many ways both these maps are ‘nice’; 
certainly, the partial derivatives 0u/Ox, 0u/Oy, 0v/Ox, 0v/Oy all 
exist. In fact g(z) is particularly nice, as it also preserves distance. 
However, as we shall soon see, it is precisely g(z) which is 
problematic when viewed as a map of the complex plane. 


Complex differentiability and the Cauchy-Riemann 
equations 


Given a function f(z), which has inputs and outputs which are 
complex numbers, we say that f (z) is complex differentiable if the 
limit of 


feth) f(z) 
h 
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57. Different ways in which the complex number h can become small. 


exists as the complex number A becomes small—that is, A 
converges to O in the complex plane. If the above limit exists, then 
it is again denoted by f’(z) as in Chapter 2, but this limit is a 
complex number, rather than a real one. So we can’t think of the 
above fraction as the gradient of a chord as we previously did, but it 
is a natural generalization to complex functions of the limit 
defining the derivative of real functions from Chapter 2. 


As the complex plane is two-dimensional, A can converge to O ina 
variety of ways. Previously, on the real line, a real number A could 
approach zero essentially ‘from the left’ (as a small negative number) 
or ‘from the right’ (as a small positive number). In the complex 
plane, these would correspond to A approaching zero along the real 
axis (Figure 57(a)), and in this case the above limit would equal 

Of /Ox, where x is the real part of z. But there are many other ways in 
which A can become small (Figures 57(b), 57(c)). The important 
point here is that we are requiring the same limit f(z) to exist, 
however h becomes small; we will see this is more demanding than 
just insisting that the partial derivatives Of /Ox and Of /Oy both exist. 


Consider now whether the functions p(z) = z? and q(z) = Z are 
complex differentiable. Note 


plz +h) -pe _ eth’ -2? 
h h 
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which has limit 2z as h becomes small in any way whatsoever. 
The algebra involved is identical to that seen in Chapter 2 when 
we differentiated x. 


However, q(z) is not complex differentiable. Note that 


q(z+h)-q(2)_z+h-z 
h h 


The conjugate of a complex number is its reflection in the real axis. 
So if h is real, then it equals its conjugate, giving h/h = 1, and we 
get a limit of 1 as h approaches 0 along the real axis. But if h is a 
purely imaginary number (that is, on the imaginary axis), then 

h = —h, and so h/h = —1, giving a limit of —1. Because these are 
different answers, this means that the h /h doesn’t have a limit as A 
becomes small. 


So conjugation is not a ‘nice’ (= complex differentiable) function; 
this may seem surprising, as conjugation is just reflection in the 
real axis. The problem is that conjugation does not respect the 
‘THIS WAY UP’ label on the complex plane and turns the plane 
over, reversing its orientation. 


As commented earlier, being complex differentiable is more 
restrictive than in the real case. If u(x, y) and v(x, y) are the real 
and imaginary parts of a complex differentiable function f(a + yi), 
then it can be shown that 


Hg) = 4; 
J (2) = pat Det 


if h becomes small along the real axis. But if h becomes small along 
the imaginary axis, we get 


(see the Appendix for details). Because f(z) is complex 
differentiable, these two limits must be the same, and hence these 
two expressions have equal real parts and equal imaginary parts. 
This gives us the Cauchy-Riemann equations, 


Ou ðv Ov Ou 
Ox Oy’ Ox Oy” 


We can check the Cauchy-Riemann equations hold for p(z), where 


we have 
u(a,y) = 2? — y’ and o(@,y) = 22y, 
so that 
Ou Ov Ov Ou 
— = 2a = — d ~=ay=-—. 
Ox v Oy E Ox Y Oy 
However, for g(z) we have u(x, y) = x and v(x, y) = —y, and 
we see 
Ou Ov 
—=1F7-1=—. 
Ox g Oy 


This again shows that conjugation is not complex differentiable. 
In practice, a function satisfying the Cauchy-Riemann 
equations is complex differentiable. I write ‘in practice’, as 
some further mild technical constraints also need to be satisfied. 
But, broadly speaking, we can think of a complex differentiable 
function f(x + yi) = u(a,y) + o(a, y)i as one with continuous 
partial derivatives 0u/Ox, 0u/Oy, 0v/Ox, 0v/Oy which satisfy 
the Cauchy-Riemann equations. By comparison, a function 
from the real plane to the real plane is differentiable if these 
four partial derivatives exist and are continuous; this is much 
less of a requirement, and one which is satisfied by 


qla + yi) = x — yi. 
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Holomorphic functions 


Complex differentiable functions defined on a region of the 
complex plane are called holomorphic or analytic functions, but, 
rather than introduce further technical language, we shall continue 
referring to them as ‘complex differentiable’. 


The theory of complex analysis is far richer than that of real analysis. 
Because we have been more restrictive in our definitions, there is a 
payoffin that we can prove a lot more. However, complex differentiable 
functions are still plentiful: they include all functions that we are likely 
to be interested in, such as polynomials, trigonometric functions, and 
exponentials, and, with a little care in their definitions, logarithms, 
roots, and powers are also complex differentiable (see Appendix). So 
the strength of complex analysis comes from being able to apply a 
richer theory to a still broad set of functions. 


For a function f(z) to be complex differentiable, its derivative f’(z) 
must exist. For real functions the story might stop there—a real 
function f(x) might be once but not twice differentiable. Such a 
function is 


{ a? if x>0; 
F(®) =| -x if w<o. 


It has derivative f'(x) = 2|a|, and we saw in Chapter 2 that 

the modulus function is not differentiable. However, the derivative 
of a complex differentiable function is always itself complex 
differentiable; this means that a complex differentiable function 
can be differentiated repeatedly. 


In fact, complex differentiable functions are even nicer than this: 
they are also analytic, meaning they can be expressed, locally at 
least, by their Taylor series; we noted in Chapter 3 that this is not 
generally true of real functions, even those that can be repeatedly 
differentiated. 


140 


Complex trigonometric functions and the 
exponential function 


In Chapter 3 we saw how the sine, cosine, and exponential functions 
can be defined for real inputs by power series. These same power 
series also converge when the input is a complex number. In 
important ways, these complex versions of sine, cosine, and the 
exponential function are very different from their real counterparts, 
but using complex numbers, we can appreciate a deep connection 
between these three functions in the form of Ewler’s identity. 


We saw earlier that the power series for sine, cosine, and the 
exponential function are 


; 2 öğ zZ g 
sin(z) =z H H , 
3! 5 ym 9! 
gr gt 26 E 
cosl) =1- 3ta = ar al 
ge gk gk gd 
ë =1+zt + 4+ 454 
2! 3! 4l 5! 


These three series converge for all complex numbers z and define 
complex differentiable functions on the entire complex plane. 
Again, the derivative of sin(z) is cos(z) and the derivative of æ is æ. 
Other identities such as 


(cos(z))” + (sin(z))* =1 


still hold true for complex z. However, this no longer implies that 
sin(z) and cos(z) lie between —1 and 1, as the squares of complex 
numbers need not be positive. In fact, the complex functions sin(z) 
and cos(z) are unbounded functions that achieve all possible 
complex numbers as outputs. 


On the real line the two trigonometric functions appear unrelated to 
the exponential. The latter is unbounded and not periodic, unlike 
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the former, but they are intimately related when seen through a 
complex lens. Firstly, note that the powers of 7 proceed as follows: 


repeating as i, —1, —7, 1,7, —1, —7,1, ... with period four. So, 


iz et E e SR Sk E 
e 1+ 224 t | T T t 
2! 3! 4! 5! 6! 
eas a re ee ee 
wo ah ale Bl 6l 


i ( a i ) ( 5 5 ) 
e 1 | eee + ats | cee], 
2! 4! 3! 5! 


or, more succinctly, 
e = cos(z) + isin(z). 


This is Euler’s identity. In particular, if we set z to be 7, and 
remember we are using radians, we get 


et 


cos(7) + i sin(z) = —1 + Oi = —1. 


This, particularly when written as e" + 1 = 0, is often rated as one 
of the most beautiful equations in mathematics, as it connects the 
fundamental numbers 0,1, 7, e, i. 


Taylor series and Laurent series 


Consider now the function 


where z is a complex number. g(z) is differentiable for all complex 
numbers, except when z = 1, where the function is not even 
defined (because we cannot divide by 0). This point, 1, where g(z) is 
not differentiable, is called a singularity. We will shortly see that 
these singularities play a key role in Cauchy’s theory of complex 
integration. 


Complex differentiable functions are analytic. So around any point 
where g(z) is differentiable, g(z) equals its Taylor series. In 
Chapter 2 we determined the Taylor series for g(z) about 0, namely 


g(z) =1+2t2?teeten. 


This series doesn’t converge for all values of z, though; it converges 
for those z inside the circle with centre O and radius 1 (Figure 58(a)) 
and does not converge outside that circle. And it is no coincidence 
that the singularity z = 1 is on this circle. Complex differentiable 
functions equal their Taylor series up to the nearest singularity. 


g(z) is also complex differentiable at 7, and so g(z) equals its Taylor 
series centred at 7, which converges within the circle centred at 7 


(a) (b) 


non-convergence 


/ convergence 


i centre singula tity Re A 
t $ + > centre -> i 
; il 
` convergende d 7 ri 
N K zy" á Fè 
as Msa singularity 


non-convergence 


58(a). The disc of convergence of g(z) about 0. 58(b). The disc of 
convergence of g(x) about i. 
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and passing through the nearest singularity 1 (Figure 58(b)). 
Specifically, that Taylor series is 


1 (z-i & i)? (z i)? 


ge) = iita- a-y ao 


(Again the first coefficient is g(i), the second is g'(i), etc., as with 
real Taylor series.) And if we return to the series which define the 
exponential, sine, and cosine functions, then these are their Taylor 
series centred at O. Because these three functions are differentiable 
at every point, they have no singularities. As there is no ‘nearest 
singularity’, these series converge on the entire complex plane. 


A complex differentiable function has a Taylor series centred at any 
point where it is differentiable. But what can be said of a function 
centred at a singularity, where the function isn’t differentiable? 
Certainly the function cannot be expressed as a Taylor series, 
because power series define differentiable functions, but we can 
resolve this if we're willing to include negative powers in the series. 
Such a series is called a Laurent series, after the French 
mathematician Pierre Alphonse Laurent. 


As examples, consider 


č 1 1 1 z 2 


— fe a fi | fn aye 
| T T T | ; 


2 2 z gl Bgl Al 
1 i? 1 | 1 i 1 i 
z Qle2 | 


1/z l ae 
1 3lz3 l lgt 


The first Laurent series is found by dividing the exponential 
series for e* by 2° and the second is found by substituting 1/z 
into the exponential series. Both functions are differentiable 
everywhere except when z = 0, where each has a singularity. 


More generally, Laurent’s theorem shows that a function f(z) 
which has a singularity at z = a, but which is otherwise 
differentiable in the vicinity of a, can be uniquely written as 
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where the Laurent coefficients ...,C—2,C—1, Co, ĉ1, C2, ... are 
complex numbers. As well as there being potentially infinitely 
many positive powers, it is entirely possible that there are also 
infinitely many negative powers. The Laurent series for g(z), 
centred at its only singularity 1, is fairly straightforward to 
find, because 


By the uniqueness of a Laurent series, this means that c_, = —1, 
and as no other power of z — 1 is present, all the other Laurent 
coefficients are zero. We shall see in the next section that 

the Laurent coefficient c_, is critically important. 

This coefficient is called the residue, for reasons that will also 
become apparent. 


Complex integrals 


Integrals in the complex plane are defined in much the same 
way as we defined line integrals in the real plane (Chapter 5). 
Let C be a curve in the complex plane which begins at a and 
ends at b (remember the direction of C is important 

(Figure 59)), and let f(z) be a complex differentiable function 
which has real part w(x, y) and imaginary part v(x, y). Then 
we define 


| foe = |(u-+i0)(de-+idy) 
C 


-( fe TEE w))+( fe dr+u ay). 


c c 
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59. A path in the complex plane and its endpoints. 


This may look somewhat complicated, but the two integrals in the 
right expression are the same as those line integrals we discussed in 
Chapter 5, and together are the real and imaginary parts of a 
complex number. 


Further, familiar results like the fundamental theorem of calculus 
still apply. If F(z) is a complex differentiable function which has 
derivative f(z), then 


b 
| fe dz = | F'(z) dz = F(b) — F(a). 


a 


And there is a complex equivalent of Green’s theorem 

known as Cauchy’s theorem (or Cauchy's integral theorem). 
This states that if C is a loop in the complex plane—a path 
beginning and ending in the same point—and if f(z) is a function 
which is complex differentiable inside and on the curve C, then 


Ignoring some relatively minor technical details, this theorem is 
essentially equivalent to Green’s theorem (see the Appendix for details). 


We can now use these two theorems to evaluate an important 
complex integral. Our curve C is the circle with centre O and radius 
1, drawn anticlockwise, and we will evaluate 


| 2" ae, 
c 


where n is a whole number. If n > 0, then we can use Cauchy’s 
theorem; the function 2” is differentiable inside and on C, and so 
the above integral equals 0. If n <0, then z” has a singularity at O 
which is inside C, and so we can’t use Cauchy’s theorem. 


gen 
n+T 
Considering C as a curve that begins at a = 1, goes anticlockwise 


However, if n< — 2, we know that z” has antiderivative 


about 0, and finishes again at b = 1, the fundamental theorem 
gives 


brt! art! you you 


[va O. 
c 


The only remaining case, and the only case of interest, is when 
n = —1 and this integral equals 277. 


Note (Figure 60) that every point on the circle C can be written as 


z = cos(0) +7 sin(@) =e”. 


If we consider C as beginning at 1, with 6 = 0, then as we go 
around the circle anticlockwise 0 increases until, at 9 = 27, we 
have gone once around C and are back at 1. Note 


« = —sin(@) + i cos(®) = i(cos(6) + i sin(@)) = ze’. 
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60. Parameterizing the circle. 


So we have z = e and dz = ie® dô yielding 


2n 


2T 
| z`! dz = | (e°) (ie) dé = | i dO = 2ni. 
Cc 


10) 10) 


It’s not clear yet, but this particular integral is at the heart of 
many of the theorems of complex analysis. An alternative method 
of evaluation using the complex logarithm appears in the 
Appendix. 


Cauchy’s residue theorem 


We are now close to appreciating Cauchy's residue theorem, one of 
the most important theorems of complex analysis and a theorem 
unlike anything found in real analysis. With C again denoting the 
anticlockwise unit circle, from the previous calculation we might 
expect 
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| (-- = ++ Co + GZ + C92? 4 =) d 2ric_. 
c 


If we can integrate ‘term-by-term’, then only the exceptional power 
z`! leaves any contribution. This is why the coefficient c_, is called 
the residue, being the only coefficient left over. 


Justifying ‘term-by-term’ integration involves some 

significant analysis; note again that this is a theorem of analysis 
that shows the order of two limit processes—integrating and 
summing infinite terms—can be interchanged. But once 
demonstrated, we see that if f(z) is a function which is complex 
differentiable inside and on the anticlockwise unit circle C, except 
at O, then 


| Fe dz = 277 x (residue of f(z) at 0). 
Jc 


With a little more work we obtain the following: if C is any 
anticlockwise loop and f(z) is a complex differentiable function 
inside and on C, except at finitely many singularities, then 


| f(z) dz = 2ni x (sum of the residues of f(z) at 
la 


the singularities inside C). 


This is Cauchy’s residue theorem at its most general. Note how 
‘elastic’ these integrals are—we could stretch the curve C in various 
ways and, as long as we don’t include new singularities or exclude 
current ones, the integral won’t change. 


In a university mathematics course, students learn a wide range of 
techniques to evaluate real integrals and infinite sums by making 
appropriate choices for the function f(z) and the curve C. Three 
such examples that can readily be approached using the residue 
theorem are 
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on dx Qn [(~sina zi. i. i1 ii nÉ 
= de=5 atatt : 
o & 2 1 2 3 4 90 


o 5—4cosæ 3’ 


These techniques are, in the main, well beyond the scope of this 
short text. But in general the aim is to reinterpret the above real 
integrals and infinite sums as complex integrals, or the real or 
imaginary parts of such integrals, and calculate these using the 
residue theorem. More detail on how the first integral can be 
evaluated is in the Appendix. 


Conformal maps and applications 


We conclude this chapter with another application of complex 
analysis. Recall that the real and imaginary parts, u(x, y) and 
v(æ, y), of a complex differentiable function f(z) satisfy the 
Cauchy-Riemann equations. There are two important 
consequences of this. 


One is that the functions u(x, y) and v(x, y) satisfy Laplace’s 
equation, 


Pu Pu Pv 8v 
Ox? * Oy? Ox? y? 


(see the Appendix for details), which is a particularly important 
partial differential equation in the study of fluid dynamics, gravity, 
and electromagnetism. 


The second consequence is that if f’(z) is non-zero, then f(z) is a 
conformal map, meaning that f(z) preserves angles. To some 
extent we have seen this before, when we looked at the effect of 
p(z) = 2° on the unit square (Figure 56(b)). The right angles that 
were at 1, 1 + 7, and 7 remain right angles at 1, 27, and —1 in the 
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image; note, though, that the right angle at 0 becomes a half-angle 
in the image, but this is because p’(z) = 2z = O when z = 0. So p(z) 
isn’t conformal at O. 


Amore complicated example is the effect that sine has on the semi- 
infinite strip where —1/2 <a <7/2 and y > 0 (Figure 61(a)). The 
function sin(z) takes this strip to the upper half of the complex 
plane (Figure 61(b)). Further, the horizontal and vertical lines are 
transformed into curves—specifically, ellipses and hyperbolas—but 
importantly, the right angles in the diagram on the left 

(Figure 61(a)), between the dashed horizontal and vertical lines, 
remain as right angles in the right-hand diagram (Figure 61(b)). 
Note that there are two points on the strip’s boundary where angles 
are not preserved—namely, the right angles at 1/2 and —7/2 
become half-angles at 1 and —1 in the image. These are again 
points where the derivative of sin(z), namely cos(z), is zero and 
sin(z) isn’t conformal. 


The upper half-plane seems a nicer region to work with than the 
strip; it’s possible to envisage how a fluid might flow horizontally in 
the half-plane (Figure 62(a)). In Figure 62(b) we can see the 

fluid flow in the strip which sin(z) transforms to the horizontal 
flow in the half-plane. More generally, fluid flows in the strip 


(a) (b) 


Alm 


-7/2 n/2 


61(a). The semi-infinite strip. 61(b). Its image under sin(z). 
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(a) (b) 
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62(a). Fluid flowing horizontally in the half-plane. 62(b). Fluid 
flowing in the semi-infinite strip. 


correspond, using sin(z), to fluid flows in the half-plane, and vice 
versa. 


We recognize, then, that understanding fluid flows (or solutions to 
Laplace’s equation) in the strip is equivalent to understanding the 
same problems in the half-plane, a much nicer region. Indeed, this 
method is particularly powerful in light of the Riemann mapping 
theorem, which states that any region of the complex plane which 
isn’t the whole plane, and which doesn’t contain any holes, can be 
transformed into the half-plane. Solutions to Laplace’s equations 
in the half-plane are well understood, and so, in principle, are 
solutions in the regions to which the Riemann mapping theorem 
applies. I write ‘in principle’, because the maps necessary to 
transform a region into the half-plane can be computationally 
messy in practice. 


In the early days of flight, Nikolai Joukowski applied such theory to 
fluid flow around an aerofoil, such as the shape of an aeroplane’s 
wing. Fluid flow around a circular disc is relatively straightforward 
to model (Figure 63(a)). Joukowski introduced the following 
complex map: 


fe) =z+}, 
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(b) 


Alin 


63(a). Fluid flowing around a disc. 63(b). Fluid flowing around an 
aerofoil. 


now named after him, which can transform a disc to the shape of 
an aerofoil (Figure 63(b)). The Joukowski map then relates the 
fluid flow about the disc to the fluid flow about the aerofoil, and 
vice versa. Joukowski was thereby able to explain how lift is 
generated on an aerofoil. 


The study of fluid dynamics and flights has moved on enormously 
since Joukowski’s time, but complex analysis remains a rich and 
important area of mathematics and science. It has important 
applications within mathematics—from number theory (more on 
this in the next chapter) through to profound connections with 
geometry and topology—and provides an important toolkit in 
diverse fields such as differential equations, solid and fluid 
dynamics, acoustics, imaging, and computer vision. 
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But there’s more... 


In this final chapter we touch on some further developments in the 
field of analysis that were not discussed in earlier chapters. 


Lebesgue integration 


Riemann’s integration theory, whilst a significant advance, did not 
prove adequate for the purposes of 20th-century analysis. It was 
largely sufficient for calculating the integral of any function that 
might arise from a real-world problem, but did not prove 
sufficiently comprehensive and powerful for the needs of 
mathematics. The difference here is that, in seeking to prove a 
general theorem in analysis, a mathematician begins with arbitrary 
integrable functions and may wish to create new functions from 
them by algebraic and analytic methods. Unfortunately, it is easy to 
step outside the comfort zone of Riemann integration, producing 
functions which are not Riemann integrable. 


In 1829, Dirichlet introduced a function which is not Riemann 
integrable. This function is certainly one that wouldn’t arise from 
modelling reality, but is the sort of pathological example which 
mathematicians need to be concerned with when seeking to prove 
general theorems. Dirichlet’s function D(z) is defined on the 
interval O <a <1 by 
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64. Dirichlet’s function. 


D(a) = 1 if xis rational, 
-~ | 0. if zw is irrational. 


Recall that a real number is called rational if it is a fraction of two 
whole numbers such as — 4 or 2; otherwise, it is termed irrational. 
The rational numbers are so dense within the real numbers that a 
graph of D(x) would appear—to the naked eye at least—as two 
solid lines (Figure 64). 


A step function satisfying g(a) < D(a) can be at most 0, except for 
finitely many rational v, where g(a) can be as great as 1; this is 
because every interval of positive length includes irrational 
numbers. This means that the lower Riemann integral of D(x) 
equals O. Likewise, a step function satisfying y(x) > D(x) must be 
at least 1, except for finitely many irrational x, where w(x) can be as 
small as 0. So the upper Riemann integral equals 1. As these values 
aren't equal, D(x) is not Riemann integrable. 


It may not appear important that such an esoteric function lies 
outside Riemann’s theory; however, whilst D(a) isn’t integrable, it 
is the limit of a sequence of integrable functions. The rational 
numbers are countable and so can be listed as q1, q2, q3, q4, --- 

A sequence of step functions ¢, (a#) can then be created by setting 
Palæ) =lat qi, go, .- -, qn, and otherwise equalling 0. These step 
functions converge to D(x); we are essentially pushing points from 
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the x-axis up to the graph of D(x) one at a time. The problem is 
that these step functions are Riemann integrable, but their limit, 
D(x), isnt. 


That non-integrable functions can be created so easily from 
integrable functions is a severe weakness for Riemann’s theory. 
In 1902, Henri Lebesgue (Figure 65(a)) introduced a much more 
comprehensive theory of integration with powerful convergence 
theorems. Further, his theory naturally encompassed integrals 
of unbounded functions and functions on unbounded intervals 
such as 


which could only be treated in Riemann’s theory as improper 
integrals—that is, as limits of Riemann integrals. 


Lebesgue also proved a result which neatly places Riemann’s 
theory of integration within his own theory: a bounded function 
J (a) on the interval a <a < b is Riemann integrable if and only if its 
discontinuities form a set of measure zero. By contrast, Dirichlet’s 
function is discontinuous on the entire interval 0 < æ <1, which has 
measure 1. 


Measure theory 


Lebesgue’s theory of integration is grounded in the notion of 
measure, as developed by his tutor Emile Borel, and substantially 
extended by Lebesgue. Measure is a generalization of length, area, 
and volume to other dimensions, which encompasses esoteric sets 
well beyond the familiar. Lebesgue integration comfortably 
handles Dirichlet’s function, because D(x) is O except on the 
rational numbers between O and 1, and the rationals form a set of 
measure zero, also known as a null set. 


156 


(a) (b) 


65(a). Lebesgue. 65(b). Constructing Cantor’s set. 


A subset of the real numbers is a null set if, given any positive 
number ¢, the set can be covered by intervals whose total length is 
at most £. This definition captures the fact that the set has ‘total 
length’ less than any given positive number, and because measure 
can’t be negative, the only remaining possibility is zero. The null 
sets are the ‘negligible’ sets of integration: for example, two 
functions which differ only on a null set have the same integral. 
Countable sets, like the rationals, are null (see the Appendix), so, 
because D(x) differs from the zero function only on a null set, they 
have the same integral of zero. 


There are null sets which are uncountable. One such set is the 
Cantor set C, which is constructed recursively (Figure 65(b)). We 
begin with Co being the whole interval O <æ <1. We create Cı by 
removing the middle third of Co, create C by removing the middle 
thirds of the two intervals making up Cj, and so on. The Cantor set is 
the collection of points which lie in every C,,. Because C,, comprises 
2” intervals of length (4)”, the measure of Cp equals (2)”. This 
becomes arbitrarily small as n increases and so C has measure zero. 


The study of measure, as a generalization of length, area, volume 
etc., led to the discovery of some rather pathological examples 
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within mathematics. Given a two-dimensional square with side 1, 
its area equals 1. If we are asked for the volume of the square, the 
only reasonable answer seems to be 0; as a three-dimensional 
object, the square has dimensions 1, 1, 0 and so volume 

1x 1x 0=0. If asked for its length, one justifiable answer is 
infinity; the top edge of the square has length 1, but the square 
itself is made up of infinitely many horizontal line segments, each 
of length 1. 


The square is two-dimensional. What is happening here is that 
when we evaluate the measure of the square in a higher dimension 
than 2, we get O, and in a lower dimension we get infinity. In 1918, 
Felix Hausdorff generalized the notion of measure to all dimensions, 
not just to whole numbers. Hausdorff associated with a given set a 
number d > 0, now known as its Hausdorff dimension. The measure 
of the set in any dimension lower than d is infinite and in any 
dimension above d it is 0. The measure of the set in dimension d 
may be finite or infinite. Unsurprisingly, the Hausdorff dimension 
of a square is 2. More surprisingly, there are sets whose dimension 
are not whole numbers—such a set is the Cantor set, which has 
Hausdorff dimension d = £22) — 0.6309... The Cantor set’s 


~~ log(3) 
measure is O in a higher dimension, infinite in a lower dimension, 


and 1 in dimension d. The Cantor set is an example of a fractal, the 
word coined to reflect its ‘fractional dimension’. 


Lebesgue’s theory is incredibly comprehensive—Lebesgue 
measurable sets and functions include any that can be explicitly 
constructed. In 1905 Giuseppe Vitali showed the existence of a 
non-measurable set; he used a further axiom of set theory called 
the axiom of choice, which makes statements about the existence of 
sets in a non-constructive way. Vitali proved that any value 
assigned for the measure of his set would lead to a contradiction. 


Most mathematicians are comfortable assuming the axiom of 
choice, and view mathematics as too limited without it; this is 
despite the axiom leading to the existence of non-measurable sets. 
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Further, in 1933 Andrey Kolmogorov used measure theory to lay 
the foundations of probability theory, where consequently there 
are random occurrences to which no probability can be assigned. 
Analysis would also help provide the technical language for 
Brownian motion, with the development of stochastic calculus. 
Brownian motion is named after the Scottish botanist Robert 
Brown to describe the random motion of particles in a medium. 
Random walks, where a particle can move discrete steps at discrete 
time intervals, are relatively straightforward to model. But there 
are technical difficulties in describing a continuous version of such 
random walks by a particle making infinitesimal steps. In 1923 
Norbert Wiener gave a mathematical definition of Brownian 
motion. 


A yet more startling result from 1924 is the Banach-Tarski 
paradox, which states that a solid sphere can be broken down into 
finitely many pieces and these pieces can then be moved and 
rotated and reassembled as two solid spheres of the same size as 
the original, thus doubling the volume. This theorem at first seems 
wholly ridiculous, as we expect that volume should be preserved; 
however, if the pieces are non-measurable, there is no reason that 
volume need stay invariant during the decomposition. Perhaps 
more surprising is that this paradox is true in three dimensions, 
but not in two. It is possible to assign a measure to all the subsets 
into which a disc might be finitely decomposed, and so area must 
remain an invariant of the decomposition. The reason for the 
difference is algebraic and relates to the interaction of rotations 
and translations, which is less complicated in two dimensions than 
it is in three dimensions. 


The bar chart (Figure 66) shows what may happen when a fair coin 
is tossed 20 times. The number of heads achieved is between 0 and 
20 (on the horizontal axis) and the bars’ heights represent the 
associated probabilities. Note the bar chart approximates a bell 
curve, technically a normal distribution, also sketched. As more 
coin tosses are repeated, the bar chart approximates the bell curve 
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66. Central limit theorem. 


better and better. This fact was understood as early as the 18th 
century by Abraham de Moivre, including in the case where the 
possibilities (heads or tails) are not equally likely. 


Getting one head from a coin toss, or zero if it lands tails, is an 
example of a random variable. More generally, a random variable 
might be a die roll (a whole number between one and six), the 
number of people in a bus queue (still a whole number, but now 
with a larger range), or today’s temperature (which has a 
continuous range of possibilities). If we repeatedly roll a die, then 
we expect to achieve an average score, long term, of around 


14+24+34+4+5+6 
6 


3.5. 


We recognize there will be periods of good and bad luck, but in the 
long term they will cancel out. This fact is precisely captured by the 
central limit theorem, which is one of the most important 
theorems in probability and statistics. A general version was 
proved by Aleksandr Lyapunov in 1900-1, which applies generally 
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to all random variables. In the long term, a large sample taken of a 
random variable will approximate a normal distribution with the 
expected average and spread. But a precise statement of the 
theorem, and its proof, are very much analytical in nature. 


Analytic number theory 


Analytic number theory is usually dated back to Dirichlet’s 
theorem of 1837. Recall that a prime number is a whole number 
p22 whose only factors are 1 and p. The list of primes begins 


2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, ..., 


and it has been known from ancient times that there are infinitely 
many prime numbers (see Appendix). It’s a natural generalization 
of this fact to ask whether there are infinitely many primes among 
arithmetic sequences such as 


4, 7,10, 13, 16,19,... 2, 5,8,11, 14, 17,..., 5,9,13,17, 21, 25,... 


An arithmetic sequence is one that grows with the same common 
difference each time—this is 3 in the first two examples above 
and 4 in the third. Clearly, if the first term and the common 
difference have a factor in common—for example, the first term 
is 3 and the common difference is 6—then every term after will 
be divisible by that common factor and not be a prime. But if the 
first term and common difference have no common factor other 
than 1, it’s plausible that there are infinitely many primes in the 
sequence. 


Euler, in 1737, used some relatively straightforward analysis to 
prove that the infinite sum of the primes’ reciprocals, 
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does not converge. Consequently, there are infinitely many primes; 
were there only finitely many, then the sum of their reciprocals 
would be finite. Note that the converse is not true; there may be 
infinite but suitably sparse numbers whose reciprocals have a finite 
sum, as with the square numbers or powers of two. Indeed, in 1915, 
in an effort to prove there are infinitely many twin primes—primes 
two apart such as 17,19 and 101, 103—Viggo Brun considered 

the sum of their reciprocals only to show this sum is finite. It 
remains unknown as to whether there are infinitely many twin 
primes or not. 


Dirichlet sought, using analytic methods, to prove the sum 
of the reciprocals of primes appearing in arithmetic sequences 
does not converge. For the earlier examples these would be 


the sums 
Coki Pd Podela, Tato Es 
7 13 1931 ? 25 UW ° 5 13 17 29° 


To do this, he introduced what are now known as Dirichlet series, 
infinite sums of the form 


Q2 Q3 Q4 as 
37 4” 


where z is a complex variable and a, a2, a3, ... is a sequence of 
complex numbers. Dirichlet series remain an important tool. 

The most important unsolved problem in mathematics is the 
Riemann hypothesis, which concerns the zeros of the Riemann 
zeta function, which is the Dirichlet series when each a,, equals 1. 
The hypothesis has important implications for the distribution of 
the prime numbers. 


More recently, in 2004, Ben Green and Terence Tao proved a 


result of a converse nature to Dirichlet’s theorem; they showed the 
existence of arbitrarily long arithmetic sequences among the prime 
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numbers. The theorem only demonstrates existence and finding 
such sequences is difficult; in 2019 an arithmetic sequence of 27 
primes was discovered and its first term is over 224 quadrillion! 


At first glance, the prime numbers appear to arise randomly. There 
is a sense they get scarcer, but I doubt whether anything seems 
immediately apparent. By studying tables of prime numbers, 
various mathematicians—Euler, Gauss, Legendre—were led to 
conjecture that, for large n, the number of primes less than n is 
approximately 


n 
log(n) ` 


Loosely speaking, this says that the chance of a number n being 
prime is 1/log(n), which decreases as n increases. This fact is now 
called the prime number theorem —its proof was another success 
for analytic number theory. In 1896 Jacques Hadamard and 
Charles-Jean de la Vallée Poussin independently proved the 
theorem, each using complex analytic methods. 


Hyperreal numbers 


The hyperreals are an extension of the real numbers that allow a 
rigorous treatment of infinities and infinitesimals. They were 
developed in 1966 by Abraham Robinson in his work Non-Standard 
Analysis, which built on earlier work of Edwin Hewitt from 1948. 


A hyperreal number can be represented by a sequence of real 
numbers such as 


Lit 
(1,2,3,4, ...), CERED (1,1,1,1,...). 


A real number g is identified with the constant sequence 
(@,a@,x,x, ...), so that the third hyperreal above is identified with 1. 
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The first hyperreal above is an infinity—because the sequence tends 
to infinity—and the second hyperreal is an infinitesimal—because 
the sequence converges to 0. Hyperreals add and multiply termwise, 
so the product of the first two hyperreals equals the third. 


Unfortunately, there are considerable technical issues with this 
approach. Firstly, there are just too many sequences; for example, 
it quickly becomes apparent that the sequence (0, 1,1,1, ...) also 
represents 1. So, two sequences are defined as representing the 
same hyperreal number if ‘most’ of their terms agree: this notion of 
‘most’ is again technical and involves introducing a measure on the 
counting numbers; each subset of the counting numbers is given 
measure 0 or 1, and a subset contains ‘most’ of the counting 
numbers if it has measure 1. Two hyperreals can then be ordered if 
most of the terms of one hyperreal are less than those of the second 
hyperreal. 


A second issue arises from products like 
(1,0,1,0, ...) x (0,1,0,1, ...) = (0,0,0,0, ...). 


In a field, two non-zero elements cannot multiply to give zero. The 
first hyperreal agrees with (1,1,1,1, ...) for odd terms and with 
(0,0,0,0, ...) for even terms. Depending on our choice of ‘most’, 
either the set of odd numbers has measure 1 and the set of even 
numbers has measure 0, or vice versa. In the first case, this means 
that the first hyperreal is identified with 1 and the second with 0 
and the above product says nothing more startling than 1 x 0 = 0. 


Within this framework it is then possible to define what an 
infinitesimal or infinite hyperreal is. This in turn opens an 
entirely different treatment of calculus. Any finite hyperreal can 
be uniquely written as a + £ where a is real, known as its 
standard part, and £ is an infinitesimal. A real function f (æ) 
can be naturally extended to a function on the hyperreals, and 
Jf can be shown to be differentiable if 
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fæ +e) - f(x) 


E 


has the same standard part for all infinitesimals £, that standard 
part equalling f'(x). Seemingly, after centuries of looking to avoid 
infinitesimals, the story of analysis had come full circle. 


Epilogue 


The history of analysis teaches us that mathematics is its own 
subject (or perhaps, instead of ‘mathematics’, that should read 
‘mathematics as done by humans’). In a mannerly fashion, by and 
large, mathematics has listened to and been influenced by its 
friends—physics, philosophy, computer science, etc.—but has 
ultimately come to its own conclusions. The notion of function 
developed through its use in modelling real-world situations, but 
the modern notion of function is much more advanced than 
everyday situations require and has ultimately been developed to 
serve the purposes of mathematics. Even when analysis is being 
used to solve a real-world problem, a proof may require theory to 
be developed well beyond the scope of the problem itself. Further, 
there is a pure instinct for mathematicians to generalize widely, 
something which sets them apart from scientists studying this 
universe. 


In undergraduate mathematics degrees, analysis is the course 
where students learn their trade of rigorous argument. More than 
any other course, it makes clear the step up to higher education, as 
students are made to justify their arguments in detail, or provide 
counter-examples to claims they previously thought were obviously 
true. In due course, however, students come to appreciate the 
rigour that analysis courses instil. 


To quote the mathematical historian Morris Kline, ‘development 
must be preceded by a period of free, numerous, disconnected, 
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often accidentally discovered, and perhaps disordered creations’. 
As we have seen in much of this text, whether grappling with 
infinity, handling Fourier’s new approach to modelling heat, 
making explicit the notions of point masses, or using Von 
Neumann’s drawing together of different approaches to quantum 
theory, mathematical analysis has commonly provided some 
certainty, closure, and confidence—secure foundations upon 
which future generations of mathematicians could build. 
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Appendix 


Collected in this appendix are a number of asides and some results 
that were considered too calculation-heavy for the main text. But 
hopefully they are approachable and of interest to a number of 
readers. 


Chapter 1 


Ramanujan’s approximation to z 

In 1910 Ramanujan gave the following approximation to n. Using 
the first term gives 7 to six decimal places, with every subsequent 
term providing eight more decimal places. 


9801/4/8 
_ 41(1103 + 1x 26390) , 8!(1103 +2 x 26390) , 12!(1103 +3 x 26390) , \? 
ORR (1!)*396* (2!)*3968 j (3!)*396" i 


where k!=1x2x 3x +: xk. 


Cantor’s proof that the real numbers are uncountable 

We will prove that the real numbers are uncountable by showing 
that there are uncountably many real numbers in the interval 

O <a <1. The proof shows that any list of numbers from this range 
misses some real number and so cannot be complete. 
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Let 21, 82,83, %4, ... be an infinite list of such numbers, with their 
decimal expansions. For example, a list might begin 


X, = 0.278190632... 
ay = 0.215249718... 
£z = 0.815311166... 
£4 = 0.957285101... 


Our task is to create a number «x in the interval 0 <æ <1 which is 
not on this list. Or, put another way, x must not equal 
a, x must not equal x2, x must not equal xz, and so on. 


We can make sure that x does not equal x by having the decimal 
expansion of x begin with 0.5 rather than 0.2. A 5 rather than a 
2 in the tenths column (highlighted in bold) ensures that x does 
not equal xı. We then use the hundredths column to make sure 
æ does not equal v2. We might continue the decimal expansion 
of x as 0.53, so that the second decimal digit of 3 is different 
from that of x2, which is 1 (again in bold). 


And we continue in this fashion, at the nth stage choosing the nth 
decimal place of æ to be different from the (bold) nth decimal place 
of x, and thereby ensuring that æ does not equal z,. In this way we 
see that x is nowhere on the list; we can do this for any list, and so 
no list can be complete. 


Because this interval alone cannot be counted, the real numbers 
can’t all be counted either. Cantor gave this simpler proof in 1890, 
his first proof dating back to 1874. 


The axioms of the real numbers 

The real numbers are a set, denoted by R, together with the 
operations of addition + and multiplication x anda relation < ‘is 
less than’, providing a notion of order, which satisfy the following 
axioms (= assumed rules). 
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° Field Axioms 
+ is associative: x + (y + z) = (æ + y) + z for any a, y, z. 


+ is commutative: æ + y = y + æ for any g, y. 
There is an element O such that x + 0 = 2 for all x. 
For any real number g there is y such that 2 + y = 0. 


1 
2 
3 
4 
5. x is associative: æ x (y x z) = (æ x y) x z for any æ,y,z. 
6. xis commutative: æ x y =y x æ for any x,y. 

7. There is a second element 1 such that æ x 1 = x for all x. 

8. For any non-zero, real number z there is y such that æ x y= 1. 
9 


x distributes over +: æ x (y +z) =& x y +æ x z for any %,y,z. 


Any set, with operations of + and x , satisfying these nine axioms 
is called a field. The real numbers, the rational numbers, the 
complex numbers (Chapter 7) and hyperreals (Chapter 8) are 

all examples of fields. The integers don’t meet axiom 8 as there 

is no y when g = 2. 


* Order axioms 

1. For any2,y, ifO<a#and0<y, thenO<a + y. 
2. Foranyaz,y, ifO<a#and0<y, thenO0<a x y. 
3. For any 2, precisely one of the following is true: 


O<ax# or O=2 or “<0. 


Any field with an order relation < which satisfies these three 
axioms is called an ordered field. The real numbers, the rational 
numbers, and the hyperreals are ordered fields. But no such 
order < exists on the complex numbers. 


* Completeness axiom 
Any bounded, increasing sequence of real numbers 
Hy <La <83 <a&4 <- converges to a limit. 


The rational numbers do not satisfy the completeness axiom. For 
example, consider the increasing sequence 
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3, 3.1, 3.14, 3.141, 3.1415,... 


of decimal approximations of z. Because the real numbers satisfy 

the completeness axiom, this sequence must have a limit, namely n. 
The above is an increasing sequence of rational numbers which lie 
between 3 and 4, and which, because 7 is irrational, doesn’t have a 
limit among the rational numbers. 


The real numbers can be described by other sets of axioms, but 
these different approaches are ultimately equivalent. Alternatively, 
some authors might begin with axioms for the counting numbers, 
deducing the properties of the rational numbers and the real 
numbers as theorems. 


Chapter 2 


The equation of the cissoid of Diocles 

Referring to Figure 4(b), a general point R on the tangent line has 
co-ordinates (¢, 2a). The point Q lies on the line of OR, and so has 
co-ordinates (ct, 2ca), where c is chosen so that Q is at distance a 
from the circle’s centre (0, a). This means that 


(ct)? + (2ca— a)? =a’. 


Solving this gives c = = a z- The point P lies on the line OR 


such that OP equals QR, so that P = ((1 — ¢)t, 2(1 — c)a). 
The co-ordinates of P are then 


Ë 2at? 


L= 5 = ; 
t? + 4a?’ Y t? + 4a? 


This is a parametric description of the cissoid in terms of t. But we 
can eliminate t by noting t = PE substituting this expression for t 
into either equation and rearranging gives 


(æ? + y’)y = 2a. 
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The rules of differentiation 
For differentiable functions f(x) and g(x) and a constant c, the 
following rules hold: 


e f(x) + g(x) is differentiable with derivative f'(x) + g'(x). 

e of(z) is differentiable with derivative çf’ (æ). 

J (x)g(a) is differentiable with derivative f’(a)g(a) + f(a)g’ (a). 
e f(g(z)) is differentiable with derivative f’(g(x))g’ (æ). 
f(x)/g(x) is differentiable with derivative 


T 


glay (x) -f (x)g' (x) 
g(x) 


The last three rules are referred to as the product rule, the chain 
rule, and the quotient rule; the quotient rule requires g(x) to be 
non-zero for f(x) /g(x) to be defined. These rules are commonly 
credited to Leibniz who was aware of them by 1677. 


Chapter 3 


Basic identities of the exponential and logarithmic functions 

1. For a fixed but arbitrary real number a, applying the product and 
chain rules shows the function f(x) = e**“e~ has zero derivative 
and so is constant. As f(0) = ef, then e”+“e~” = e for all a, a. 
Replacing a with x + a and a with —a, we see e” +4 = efe” for all 
real x, a. 

2. For positive r,s, we write r = e” and s = ef, to obtain 


log(rs) = log(é’e") = log(e***) = x + a = log(r) + log(s). 


3. Recall that y = e” satisfies © = = y, so that 


dx 1 1 
Zisi, 
dy %¥ y 
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Because x = log(y), the derivative of log(y) (with respect to y) is 7 


4. Given a > 0 and real x, we define a” = e” 8  , By the chain rule, a” 
differentiates to 


(log(a) )e* log (a) — (1og(a))a". 


A trigonometric identity 

The derivatives of sin(x) and cos(x) are cos(x) and —sin(z) 
respectively, and the derivative of x? is 2x. So, by the chain rule, the 
derivative of (sin(a))” + (cos(a))? is 


2(sin(a))(cos(a)) + 2(cos(#))(—sin(#)) = 0. 
At x = 0, (sin(x))? + (cos(x))” = 0? + 1? = 1. Because only 


constant functions have zero derivative, (sin(x))” + (cos(#))” = 1 
for all values of x. 


Rederiving Madhava’s sum 


The tangent tan(x) of an angle x is defined by 


na) = Sa 


so that tan(7/4) = 1 as sin(2/4) = cos(2/4) = 1/\/2. Now inverse 
tangent (or arc tangent) y = tan '() is defined as the value in the 
range —$ <y< %such that tan(y) = x. In particular, we have 

tan ™™(1) = 7/4. 


The rules of differentiation applied to the equation tan (y) = æ show 
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We found the power series for 1/(1 — x) in Chapter 3; replacing 
æ with —x?, we find 


dy 1 2 
dx 1+ 2 


Noting y(0) = O and applying term-by-term integration, we finally 
obtain 


This series is valid for —1 < æ < 1 and setting x = 1 yields Madhava’s 
sum for z. 


e is irrational 


We will use Euler’s definition for e, namely 
1 i 1 f: 1 } 1 i 1 } J 1 } 
: Ag ala "onl | 


So, for any n, we have 


O<e he Perea i 1 1 1 f 
T Fa gal al! T ( T T 


The right-hand side is less than 


1 Pie eA Ie eae 
(n+! (m+) ° (n +1)? | 


aoa! ae Pye 1 
(n+1)! (n +1) n(n!) ’ 
using the power series for 1/(1 — æ) found in Chapter 3. Hence 
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(0) pig Gon _ i 1 
— PPP ort art at Ta) < aad 


for any n. If e = m/n were rational, then we could multiply the 
above by n! to obtain 


m n! n! n! 1 
o<n! ni+n!4 H } fee +1]< 
n 2! 3! 4l n 


This is a contradiction, as the middle expression is a whole 
number, but there is no whole number strictly between 0 and 1/n. 
The assumption that e is rational leads to a contradiction, and 
hence ¢ is irrational. 


Euler’s first solution of the Basel problem 
Note firstly that 


x x 1 1 1 2 
1 1 1 H g4 K A 
Tı To, Ti T2 ioe) 


The roots of the quadratic on the left are rı and r2, and the sum of 


their reciprocals equals negative the coefficient of x on the right. 
And this remains true for finitely many brackets and general 
polynomials. However, the result is not true of power series; for 
example, the equation e” = 0 has no solutions, yet the coefficient of 
æ in the exponential series is 1. 


Nonetheless, Euler noted that 


sin /xv 1-2 x x 


Ve 3l 5 zo” 


by using the Taylor series for sine. As sine equals zero at 
m, 27,37, ... the above power series equals zero when 
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x = 1, (27)? = 22x”, (37)? = 3227, ... He then ‘concluded’ that 


their reciprocals add to negative the coefficient of x yielding 


| i ji I ji 


1 1 1 1 1 —1 1 
2 P P P BP | 3l) 6 


Rearranging this last equation would give S = 2/6. Euler could 
approximate S by evaluating some partial sums to check the 
plausibility of his answer, but this provides only supporting 
evidence and doesn’t constitute a rigorous proof. 


Chapter 4 


Deriving Newton’s method 
The gradient of the graph y = f (x) at (a J (m)) equals f’ (xı), so 
the tangent line at this point has equation 


y —f (@1) =f" (a1) (@ — æ). 


This line crosses the a-axis at (x2, 0). By substituting x = æ and 
y = O into the equation and solving for x», we derive Newton’s 
iteration: 


= f(a) 
Ly = V — ar: 
f (a) 
Iff (x) is increasing at x = a, and the derivative f’(x) is increasing 
on the interval a < æ < a, then the iterations x, will converge 
quadratically as a decreasing sequence to the solution a. 


The stable Lotka-Volterra equilibrium 
At an equilibrium for the Lotka-Volterra equations, we have 
F (t) = 0 = R' (t) so that 
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O = -mF + aFR, O = bR-KFR. 


There are two solutions to these equations: (F,R) = (0,0) and 
(F,R) = (6/k,m/a). 


To analyse the second equilibrium, we set 


pave. R=” 


Eo, 
ke ge 


where £ and £z are small enough that we will consider only linear 
terms involving them. Substituting these expressions back into the 
Lotka-Volterra equations, we get 


; b b m ab 
El Tm gL ale Pia a poe 


This last equality is found by expanding the brackets and ignoring 
the negligible term ası£> . A similar calculation for £3’ gives 


c4 = —(mk/a)e,. Combining these equations, we have 


1 
g = (a) (Fe) wes ESN me mbs. 


& =A siny mbt+ B cosy mbt for constants A and B is the general 
solution to this differential equation and we can use £ = (k/ab)s' 


to find a similar expression for £2. These expressions for £ and £2 
describe small oval (specifically elliptical) orbits around the 
equilibrium point, showing it to be stable. 


Chapter 5 


Minimizing the least-squares error 
If we expand the brackets in the expression for the total error E(a, b) 
from Chapter 4, with n data points (rather than 6), we find that 
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E(a,b) = Sa? + 2Tab + nb? — 2Aa — 2Bb + C, 


where 


2 2 
S=aptagte ta, T =a Hatt AEn 


A = aY + LY + + LnYn, B=Yy + Y2 + +Y, 


C =y +Y + +Y 


Now E(a, b) is minimal when dF /da = 0 = ðE /ðb. Differentiating, 
we find that 


OE 
da 


2Sa + 2Tb — 2A = O, a 2Ta + 2nb — 2B = 0. 


These simultaneous equations can then be solved to find the 
optimal values for a and b. 


Lines are the shortest curves 
In this example F(a,f,f ) = ,/1+ (f"). Note F does not depend 
on f, and so oF /df = O. By the chain rule, we have that 


(+f)  @f)=-—A—. 


F 


NI= 


So the Euler-Lagrange equation now reads 


d f 
dx 1+(f')? 


Finally, f (x) = æ solves this as f'(x) = 1, so that the expression in 
the brackets equals 1/ v2; this is a constant, meaning its derivative 
is zero and so equals the right-hand side. 
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The curl of a conservative field is zero 


Given a vector field v = (vx, vy, vz), curl(v) is the vector field 
defined by 


Ov, Wy By OVz By Ve 
aay) (F dz’ dz dv’ dx dy) 


If v = grad( f) = (3f /ðx, of /dy, If /ðz), then 


2 2 2 2 2 2 
ai= (2 _2F a 2F e No] 
ðyðz dzdy’dzdx ðxðz’ ðxðy ðyðx 


because, in each case, the order of differentiation does not 
matter. 


Stokes’ theorem implies Green’s theorem 
Green’s theorem states that for a region R in the æy-plane, and 
functions P(x, y) and Q(z, y), 


dQ =) 
Pdx + Qdy = dA. 
| oR fe eg l R (2 dy 


Green’s theorem is a special case of Stokes’ theorem as follows. Set 
v= (Pla, y), Q(x, y), o) Because R lies in the plane, the unit 
normal is (0, 0,1), so the component of curl (v) in this normal 


direction is just the z-component of curl(v). Setting vy = P and 
vy = Q in the above definition for curl, we find 


Jieeoiee= [ara] 


On the other hand, dr = (da, dy, 0) is an infinitesimal tangent 
vector to the boundary dR and 
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v-dr = (P,Q, 0) - (dx, dy, 0) = Pdx + Qdy. 


Putting these expressions into Stokes’ theorem gives Green’s theorem. 


Chapter 6 


Deriving the Fourier coefficients 
Setting t = O in the expression for y(x, t) as an infinite sum 
of yn(x,t), we get 


. ( Te . (20x _ (37x 
y(x,0) = fle) = Aisin) -4a sin( zz) -4s sin( zz) fte 


The trigonometric identity 


2 sin( maz) sin rze) ad < 2) z < +a =) 


can be used to show that 


f sin (F ) sin F5 dx = 0, 
JO 


when m and n are distinct. And when m = n, this integral becomes 


L 
z| ( cos (272) ) De 
3da L 2 


If we multiply the infinite sum for y(x, 0) by sin(nzæ/L) and 
integrate between œ = O and x = L, then all the integrals equal 
zero, except the nth one. Specifically, we get 


giving Fourier’s expression for An. 
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Chapter 7 


Deriving the Cauchy-Riemann equations 
Let f(z) = u(a, y) + v(x, y)i be a complex differentiable function 
where z = æ + yi. Then 


f (2) = the limit of Perna) 2g h becomes small 


exists and gives the same value however h converges to zero. In the 
case when 2 is real, then z + h = (æ + h) + yi and so 


Sth) fle) _ [uae+h,y) + olx + h, yji] - lul, y) + olx, y)i] 
h h 


[u(x +h, y) — u(x, y)] [olx +h,y) — v(x,y)] 
h i h 


i, 


which has limit 
ðu ðv i 
ôx Ox 


as h becomes small. 


If instead h = ik is purely imaginary, then k becomes small when A 
becomes small; also note that z + h = æ + i(y + k). So 


f(z +h) -f(2) _ [ula,y + k) + olx, y + k)i] - [u(x y) + olx, y)i] 


h ki 


_ Bay + k) -ol,y)] _ luly +k) -ule y) 


k k 


which has limit 
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dv Ou i 
oy dy 
as h, and so k, becomes small. 
As these two limits must be equal, their real parts are equal and 


separately their imaginary parts are equal, so we've proved the 
Cauchy-Riemann equations 


ðu dv ðv du 
dx dy’ ax dy 


Proving Cauchy’s theorem from Green’s theorem 

Let f(x + yi) = u(x, y) + to(a, y) be a complex differentiable 
function with real part u(x, y) and imaginary part v(x, y) and let C 
be an anticlockwise loop in the complex plane. Cauchy’s theorem 
states that 


[zo dz=0. 


C 


This can be deduced from Green’s theorem. Recall that Green’s 
theorem states that 


ôQ OP 
Pdx + Qd dz dy, 
lj ey IRE a) 4 


where R is the region bounded by C, and P(x,y) and Q(x, y) are 
real differentiable functions. Further 


f(z) dz = (u + tv) (dx + idy) = (u dæ — v dy) + i(v dx + u dy). 


If we apply Green’s theorem separately to the real and imaginary 
parts of f (z)dz, we get 
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[pe J (E-H) eared (2-2) are 


and both integrals are zero from the Cauchy-Riemann equations. 


Complex logarithm and powers 

An important consequence of Euler’s identity is that the 
complex exponential function is periodic. As sine and cosine 
have period 27, the exponential function has period 277, 
meaning e*+?”' = e for any z. This is very different behaviour 
compared with the real numbers (Chapter 3). 


The complex exponential attains all possible outputs except zero; 
so for any non-zero complex number z, there is a complex number 
w such that 


For z, a positive real number, there is a unique real number w such 
that e” = z and w is called the logarithm of z, written log(z). By 
contrast, with complex numbers, if w solves e® = z, then so does 
w+ 271, as do w + 4zi and w — 27i. In fact, there are infinitely 
many values w that solve the equation. Which of these should we 
think of as the logarithm of z? 


Objectively, none of these w is a preferred solution for e” = z, so 
mathematicians get around this problem by making a choice—a 
so-called principal value—and then working consistently with that 
choice. In this way it is possible to define a choice of complex 
logarithm log(z) which is a complex differentiable function and has 
derivative 1/z. 


This issue with the complex logarithm also applies to powers of 
complex numbers. For example, there is no single, preferred value 
we might assign to 7’. From Euler’s identity we know that 
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é/? = cos(z/2) + i sin(z/2) =O +i x 1=i, 


and so we might argue that 


“4 i i j2 s 
i (e) e m/2 e z2. 


and this is indeed a possible correct answer. But it is also the case that 
e'7/2 — eiz/2 — i, due to the periodicity of the exponential function, 
and these lead to different values for i. In fact, there are infinite 

possible values for i; perhaps surprisingly, all of them are real. But 
again it is possible, for any complex number a, to define a preferred 


complex differentiable function z“ which has derivative az“. 


An appreciation of the complex logarithm gives further insight into 


why La dz equals 277, where C denotes the anticlockwise unit 


circle. n = —1 is a special case, because in each other case z” has 
. . . +1 . 
antiderivative eae This doesn’t make sense when n = —1, but we 


know that z has antiderivative log(z). 


But we can’t smoothly choose principal values for log(z) on all of C. 
If instead we think of C as starting at a = 1, circling the origin and 
returning to a distinct end point b = 1, then we can define log(z) 
for each point of C (Figure 67). 


Recall that log(z) is a value w such that e” = z. At a = 1 we can set 
log(a) = 0, as e? = 1. And at z = e” we set log(z) = i0. As we move 
anticlockwise around C, this choice of log(z) changes smoothly; 
when we get to the top of C we have log(7) = iz/2, and half-way 
around C we find log(—1) = iz. But as we continue around the 
lower semi-circle of C, we see that 0 is approaching 27 and log(z) is 
approaching 277. Even though a = 1 = b, we assign log(b) = 27i, 
as b is at the ‘end’ of C. By the fundamental theorem, 


[ tae log(b) — log(a) = 27i — 0 = 2zi. 
c 
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AY 


m 


log(i) = in/2 


log(e’”) = 10 


log(-1) = ia de(a)=O0 Re 


log(-i) = 3in/2 


67. Logarithm on the circle. 


Put another way, the above integral takes this value precisely 
because it’s the period of the exponential function. 


Evaluating an integral with the residue theorem 
The first integral listed in Chapter 7 is 


27 dx 
|, 5 — 4 cos x 


To turn this into a complex integral, we set z = e””. As x increases 
from O to 27, then z moves once anticlockwise around the unit 
circle C. Now dz = ie” dæ = izdx and, by Euler’s identity, 


et 3 eo gta 
2 2 


COS x 


Substituting these expressions for dx and cosg into the integral, it 
rearranges to the complex integral 
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| i dz | i dz 
c22 -—5z+2 Jo2(z-—2)(z-4) 


The integrand here has a singularity z = į which, 
importantly, is inside C and one at z = 2 which is outside 
of C. The residue theorem shows the integral equals 

2ni x (residue at 4). Finally, the Laurent series centred 
at į begins 


EE sà E 3 iG z) 


and so the residue at $ equals — i, Hence the integral equals 


‘ : 1 ; —i 27 
Qni x | residue at 3 = 2m x | — | =—. 


The real and imaginary parts satisfy Laplace’s equation 
If f(x + yi) = u(x, y) + to(a, y) is complex differentiable, the 
Cauchy-Riemann equations hold—that is, 


ðu ðv ðv ðu 
ôx ay’ dx ðy ` 


The order of differentiation in mixed derivatives does not 
matter, so 


Pui ð (du ð (dv ð (dv ð ðu ru 
ôx? ax \ Ox dx \ dy dy \dx dy\ dy dy?’ 


showing u satisfies Laplace’s equation. The same result for v 
follows similarly. 
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Chapter 8 


A countable set is null 
The elements of a countable set can be listed as 71, 72,73, ... Given 
any positive £ we can cover each ry with the interval centred on ry 
and having length £/2*. The total length of these intervals is 

€ € € € € 


pon pst pa gs 


as shown in Chapter 1. 


There are infinitely many primes 

Euclid proved in his Elements that there are infinitely many prime 
numbers. The proof does not involve analysis but is a classic of 
mathematics, and so is included here. Let pı, po, ---, Pn be finitely 
many prime numbers. Euclid then introduced the number 


N =p X p x X Prot. 


Note that none of pı, po, . . . , Pn divides N, because each leaves 
remainder 1. 


There are two alternatives—either N is itself a new prime (as arises 
from 2 x 3 x 5+ 1 = 31) or the prime factors of N are new 
prime numbers (as arises with 2 x 3 x 5 x 7 x 11 x 13+1 

= 30031 = 59 x 509). In either case, we have found a new prime 
number not on our list, and this process can be continued 
indefinitely. 
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Historical timeline 


c.370 sce Eudoxus develops Method of Exhaustion for calculating 
areas 


c.270 sce Archimedes shows 31$ <m <34 


c.1400 Madhava and the Kerala school develop power series 

1614 Napier introduces logarithms 

c.1620 Harriot investigates continuous compounding and the 
exponential function 

1636 Fermat’s method of tangents 

1637 Descartes introduces Cartesian co-ordinates 

1668 James Gregory gives the first proof of a limited version of 
the Fundamental Theorem of Calculus 

1669 Newton shares his De Analysi in a limited way 

1673 Leibniz coins the term ‘function’ 

1684 Leibniz’s first publication on calculus 

1687 Newton publishes his Principia 

1696 Johann Bernoulli poses the brachistochrone problem 

1703 Guido Grandi studies the sum 1 — 1+1- 1+- 

1715 Brook Taylor publishes on Taylor series 

1734 Bishop Berkeley’s The Analyst 

1737 Euler shows that e is irrational 

1747 D’Alembert derives the wave equation 

1748 Euler’s Identity in Introductio in Analysin Infinitorum 
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1757 


1760 
1761 
1768 
1799 
1801 


1813 


1817 


1821 


1822 


1825 
1826 
1828 


1829 


1829 


1837 
1837 


1843 
1850 


1853 
1854 


D’Alembert first writes down the Cauchy-Riemann 
equations. Later developed by Euler (1797), Cauchy (1814), 
and Riemann (1851) 


Lagrange poses Plateau’s problem 

Johann Lambert proves that z is irrational 

Euler’s method appears in Institutionum Calculi Integralis 
Caspar Wessel first represents complex numbers in a plane 


Gauss uses least-squares method to relocate the asteroid 
Ceres 


Argand publishes a proof of the Fundamental Theorem of 
Algebra (Gauss is often cited as having proved the theorem 
earlier, in 1799, but his proof was incomplete, with a 
significant topological gap) 


Bolzano defines convergence and continuity and proves the 
Intermediate Value Theorem 


Cauchy’s Cours d'Analyse discusses limits and attempts to 
formalize analysis 


Fourier’s Analytical Theory of Heat introduces Fourier 
series 


Cauchy publishes his integral theorem 
Ostrogradsky proves the divergence theorem 


Green’s theorem appears in An Essay on the Application of 
Mathematical Analysis to the Theories of Electricity and 
Magnetism 


Dirichlet’s memoir on the convergence of Fourier series 


Dirichlet constructs a function which isn’t Riemann 
integrable 


Dirichlet founds analytic number theory 


Dirichlet shows an absolutely convergent infinite sum 
always rearranges to the same limit 


Laurent’s theorem 


William Thomson (Lord Kelvin) writes to Stokes with what 
is now referred to as Stokes’ theorem 


Riemann proves his rearrangement theorem 


Riemann defines his theory of integration (published 1868) 
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1861 
1870 
1872 


1873 


1873 
1874 
1875 
1882 
1890 


1896 
1899 


1900 


1902 
1905 
1910 
1910 
1913 


1918 


1923 
1924 
1925 
1930 


1932 


1932 


Weierstrass lectures on e—6 analysis 
Weierstrass proves the isoperimetric inequality 


Weierstrass describes a continuous function that is 
differentiable nowhere 


Continuous function found with Fourier series that doesn’t 
converge 


Hermite shows that e is transcendental 

Cantor proves that the real numbers are uncountable 
Darboux reformulates Riemann’s integral 

Ferdinand von Lindemann proves that z is transcendental 


Cesaro generalizes the notion of convergence of infinite 
sums 


Prime number theorem proved using analytic methods 


J. Willard Gibbs’ ‘phenomenon’ of poor Fourier series 
convergence near discontinuities 


Lyapunov proves a general version of the central limit 
theorem 


Lebesgue defines his theory of integration 

Vitali gives an example of a non-measurable set 
Ramanujan’s approximation to 7 

Joukowski studies aerofoils using complex analysis 


Ramanujan further generalizes the notion of convergence of 
infinite sums 


Hausdorff measure and Hausdorff (fractal) dimension 
defined 


Wiener models continuous-time Brownian motion 
Banach and Tarski publish their paradox 
Lotka and Volterra (1926) model predator-prey dynamics 


Dirac introduces his delta function in The Principles of 
Quantum Mechanics 


Von Neumann’s Mathematical Foundations of Quantum 
Mechanics 


Banach publishes Theory of Linear Operations, a seminal 
work in functional analysis 
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1936 
1945 
1950 
1966 


1970 
1992 
2004 


Sobolev first introduces distributions 
Elie Cartan generalizes Stokes’ theorem 
Schwartz wins a Fields Medal for his work on distributions 


Abraham Robinson publishes Non-standard Analysis, 
introducing the hyperreals 


Osserman completes proof of Plateau’s problem 
Different, homophonically indistinguishable drums found 


Ben Green and Terence Tao prove their theorem 
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NUMBERS 


A Very Short Introduction 
Peter M. Higgins 


Numbers are integral to our everyday lives and feature in 
everything we do. In this Very Short Introduction Peter M. Higgins, 
the renowned mathematics writer unravels the world of 
numbers; demonstrating its richness, and providing a 
comprehensive view of the idea of the number. Higgins paints 
a picture of the number world, considering how the modern 
number system matured over centuries. Explaining the various 
number types and showing how they behave, he introduces 
key concepts such as integers, fractions, real numbers, and 
imaginary numbers. By approaching the topic in a non-technical 
way and emphasising the basic principles and interactions of 
numbers with mathematics and science, Higgins also 
demonstrates the practical interactions and modern applications, 
such as encryption of confidential data on the internet. 
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STATISTICS 


A Very Short Introduction 
David J. Hand 


Modern statistics is very different from the dry and dusty 
discipline of the popular imagination. In its place is an exciting 
subject which uses deep theory and powerful software tools to 
shed light and enable understanding. And it sheds this light 
on all aspects of our lives, enabling astronomers to explore the 
origins of the universe, archaeologists to investigate ancient 
civilisations, governments to understand how to benefit and 
improve society, and businesses to learn how best to provide 
goods and services. Aimed at readers with no prior mathematical 
knowledge, this Very Short Introduction explores and explains 
how statistics work, and how we can decipher them. 


www.oup.com/vsi 


INFORMATION 


A Very Short Introduction 


Luciano Floridi 


Luciano Floridi, a philosopher of information, cuts across many 
subjects, from a brief look at the mathematical roots of 
information - its definition and measurement in ‘bits’- to its role 
in genetics (we are information), and its social meaning and 
value. He ends by considering the ethics of information, 
including issues of ownership, privacy, and accessibility; 
copyright and open source. For those unfamiliar with its 
precise meaning and wide applicability as a philosophical 
concept, ‘information’ may seem a bland or mundane topic. 
Those who have studied some science or philosophy or 
sociology will already be aware of its centrality and richness. 
But for all readers, whether from the humanities or sciences, 
Floridi gives a fascinating and inspirational introduction to this 
most fundamental of ideas. 


‘Splendidly pellucid.’ 


Steven Poole, The Guardian 


www.oup.com/vsi 


CHAOS 


A Very Short Introduction 
Leonard Smith 


Our growing understanding of Chaos Theory is having 
fascinating applications in the real world - from technology to 
global warming, politics, human behaviour, and even gambling 
on the stock market. Leonard Smith shows that we all have an 
intuitive understanding of chaotic systems. He uses accessible 
maths and physics (replacing complex equations with simple 
examples like pendulums, railway lines, and tossing coins) to 
explain the theory, and points to numerous examples in 
philosophy and literature (Edgar Allen Poe, Chang-Tzu, Arthur 
Conan Doyle) that illuminate the problems. The beauty of fractal 
patterns and their relation to chaos, as well as the history of 
chaos, and its uses in the real world and implications for the 
philosophy of science are all discussed in this Very Short 
Introduction. 


‘...Chaos...will give you the clearest (but not too painful idea) of 
the maths involved... There’s a lot packed into this little book, and 
for such a technical exploration it’s surprisingly readable and 
enjoyable - | really wanted to keep turning the pages. Smith also 
has some excellent words of wisdom about common 
misunderstandings of chaos theory...’ 


popularscience.co.uk 
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